The Agent Evaluation Lifecycle
Core Concepts
When to Use What
| Use Case | Solution |
|---|---|
| Test prompt changes before deploying | Experiments |
| Compare different models or configurations | Experiments |
| Run quality checks in CI/CD | Experiments CI/CD |
| Monitor production quality over time | Online Evaluation |
| Block harmful or policy-violating content | Guardrails |
| Get alerts when quality drops | Online Evaluation + Triggers |
Quick Start
1. Run Your First Experiment
Test your LLM on a dataset using the Experiments via UI or via code:- Platform
- Python
- TypeScript
Go to Experiments and click “New Experiment” to get started with the UI.
2. Set Up Online Evaluation
Monitor your production traffic with evaluators that run on every trace:- Go to Monitors
- Create a new monitor with “When a message arrives” trigger
- Select evaluators (e.g., PII Detection, Faithfulness)
- Enable monitoring