LangWatch provides a library of ready-to-use evaluators for common evaluation tasks. You can use these directly in your code without any setup on the platform.Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
When to use Built-in Evaluators:
- You want to quickly add evaluation without platform configuration
- You’re running experiments or online evaluations programmatically
- You want to use well-tested, standardized evaluation methods
- Saved Evaluators - Reuse configured evaluators across your project
- Custom Scoring - Send scores from your own evaluation logic
Available Evaluators
LangWatch offers evaluators across several categories:| Category | Examples | Use Case |
|---|---|---|
| RAG Quality | ragas/faithfulness, ragas/context_precision | Evaluate retrieval-augmented generation |
| Safety | presidio/pii_detection, azure/jailbreak | Detect PII, jailbreaks, harmful content |
| Correctness | langevals/exact_match, langevals/llm_boolean | Check answer accuracy |
| Custom Criteria | langevals/llm_boolean, langevals/llm_score | LLM-as-Judge for custom checks |
Using Built-in Evaluators
In Experiments
Run evaluators on your test dataset during batch evaluation:In Online Evaluation
Run evaluators on production traces in real-time:As Guardrails
Use evaluators to block harmful content before responding:Evaluator Inputs
Different evaluators require different inputs. Check the evaluator list for each evaluator’s requirements.| Input | Description | Example Evaluators |
|---|---|---|
input | User question/prompt | Jailbreak Detection, Off-Topic |
output | LLM response | PII Detection, Valid Format |
contexts | Retrieved documents (array) | Faithfulness, Context Precision |
expected_output | Ground truth answer | Answer Correctness, Exact Match |
conversation | Conversation history | Conversation Relevancy |