> ## Documentation Index > Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # Evaluators Overview > Understand evaluators - the scoring functions that assess your LLM outputs for quality, safety, and correctness. **Let your agent set this up.** [Copy the evaluations prompt](/docs/skills/code-prompts#set-up-evaluations) into your coding agent to get started automatically. Evaluators are scoring functions that assess the quality of your LLM's outputs. They're the building blocks for [experiments](/docs/evaluations/experiments/overview), [online evaluation](/docs/evaluations/online-evaluation/overview), and [guardrails](/docs/evaluations/guardrails/overview). ## Choose Your Approach There are three ways to evaluate your LLM outputs with LangWatch: Use LangWatch's library of evaluators directly in your code. Create reusable evaluator configs on the platform. Send scores from your own evaluation logic. ### Which should I use? | Approach | Slug Format | Best For | | ----------------------- | --------------------------------------------------- | ------------------------------------------ | | **Built-in Evaluators** | `provider/evaluator` (e.g., `ragas/faithfulness`) | Quick setup, standard evaluation methods | | **Saved Evaluators** | `evaluators/{slug}` (e.g., `evaluators/my-checker`) | Team collaboration, UI-based configuration | | **Custom Scoring** | N/A - you send the score directly | Proprietary logic, domain-specific metrics | ``` Do you have your own evaluation logic? ├─ Yes → Use Custom Scoring └─ No → Do you want to configure via UI and reuse? ├─ Yes → Use Saved Evaluators └─ No → Use Built-in Evaluators ``` ## What is an Evaluator? An evaluator takes inputs (like the user question, LLM response, and optionally context or expected output) and returns a score indicating quality along some dimension. ``` Input + Output + Context → Evaluator → Score ↓ passed: true/false score: 0.0 - 1.0 details: "explanation" ``` ## Built-in Evaluator Categories LangWatch provides a library of ready-to-use evaluators: | Category | Examples | Use Case | | ------------------- | ------------------------------------------------------ | --------------------------------------- | | **RAG Quality** | Faithfulness, Context Precision, Context Recall | Evaluate retrieval-augmented generation | | **Safety** | PII Detection, Jailbreak Detection, Content Moderation | Detect harmful content | | **Correctness** | Exact Match, LLM Answer Match, Factual Match | Check answer accuracy | | **Format** | Valid JSON, Valid Format, SQL Query Equivalence | Validate output structure | | **Custom Criteria** | LLM-as-Judge (Boolean, Score, Category) | Custom evaluation prompts | [Browse all evaluators →](/docs/evaluations/evaluators/list) ## Quick Examples ### Using a Built-in Evaluator ```python theme={null} import langwatch # Use directly by slug langwatch.evaluation.evaluate( "ragas/faithfulness", # Built-in evaluator name="Faithfulness Check", data={ "input": user_input, "output": response, "contexts": contexts, }, ) ``` ### Using a Saved Evaluator ```python theme={null} import langwatch # Use your saved evaluator by its slug langwatch.evaluation.evaluate( "evaluators/my-tone-checker", # Saved on platform name="Tone Check", data={ "input": user_input, "output": response, }, ) ``` ### Sending Custom Scores ```python theme={null} import langwatch # Run your own logic and send the result score = my_custom_evaluator(input, output) langwatch.get_current_span().add_evaluation( name="my_custom_metric", passed=score > 0.7, score=score, ) ``` ## Using Evaluators ### In Experiments Run evaluators on each row of your test dataset for batch evaluation: ```python theme={null} experiment = langwatch.experiment.init("my-experiment") for idx, row in experiment.loop(df.iterrows()): response = my_llm(row["input"]) experiment.evaluate( "ragas/faithfulness", index=idx, data={ "input": row["input"], "output": response, "contexts": row["contexts"], }, ) ``` [Learn more about experiments →](/docs/evaluations/experiments/overview) ### In Online Evaluation (Monitors) Run evaluators automatically on production traces: 1. Create a monitor in LangWatch 2. Select evaluators to run 3. Configure when to trigger (all traces, sampled, filtered) 4. Scores appear on traces and dashboards [Learn more about online evaluation →](/docs/evaluations/online-evaluation/overview) ### As Guardrails Use evaluators to block harmful content in real-time: ```python theme={null} guardrail = langwatch.evaluation.evaluate( "azure/jailbreak", name="Jailbreak Detection", data={"input": user_input}, as_guardrail=True, ) if not guardrail.passed: return "I can't help with that request." ``` [Learn more about guardrails →](/docs/evaluations/guardrails/overview) ## Evaluator Inputs Different evaluators require different inputs: | Input | Description | Example Evaluators | | ----------------- | ------------------------- | ------------------------------- | | `input` | User question/prompt | Jailbreak Detection, Off-Topic | | `output` | LLM response | PII Detection, Valid Format | | `contexts` | Retrieved documents | Faithfulness, Context Precision | | `expected_output` | Ground truth answer | Answer Correctness, Exact Match | | `conversation` | Full conversation history | Conversation Relevancy | Check each evaluator's documentation for required and optional inputs. ## The `name` Parameter **Important:** Always provide a descriptive `name` when running evaluators. This helps identify evaluation results in Analytics and traces. ```python theme={null} # Good - descriptive name langwatch.evaluation.evaluate( "langevals/llm_category", name="Answer Completeness Check", # Descriptive! data={...}, ) # Bad - no name, hard to track langwatch.evaluation.evaluate( "langevals/llm_category", data={...}, ) ``` ## Next Steps