Skip to main content
LangWatch provides comprehensive evaluations tools for your LLM applications. Whether you’re evaluating before deployment or monitoring in production, we have you covered.

The Agent Evaluation Lifecycle

BUILD → TEST → DEPLOY → MONITOR
         ↓              ↓
    Experiments    Online Evaluation
         ↓              ↓
    CI/CD Gate      Guardrails

Core Concepts

When to Use What

Use CaseSolution
Test prompt changes before deployingExperiments
Compare different models or configurationsExperiments
Run quality checks in CI/CDExperiments CI/CD
Monitor production quality over timeOnline Evaluation
Block harmful or policy-violating contentGuardrails
Get alerts when quality dropsOnline Evaluation + Triggers

Quick Start

1. Run Your First Experiment

Test your LLM on a dataset using the Experiments via UI or via code:
Go to Experiments and click “New Experiment” to get started with the UI.

2. Set Up Online Evaluation

Monitor your production traffic with evaluators that run on every trace:
  1. Go to Monitors
  2. Create a new monitor with “When a message arrives” trigger
  3. Select evaluators (e.g., PII Detection, Faithfulness)
  4. Enable monitoring

3. Add Guardrails

Protect your users by blocking harmful content in real-time:
import langwatch

@langwatch.trace()
def my_llm_call(user_input):
    # Check input before processing
    guardrail = langwatch.evaluation.evaluate(
        "azure/jailbreak",
        name="Jailbreak Detection",
        as_guardrail=True,
        data={"input": user_input},
    )

    if not guardrail.passed:
        return "I can't help with that request."

    # Continue with normal processing...

Supporting Resources