Online Evaluation Overview

Let your agent set this up. Copy the evaluations prompt into your coding agent to get started automatically.

Online evaluation lets you continuously score your LLM’s production traffic. Unlike experiments which test before deployment, online evaluation monitors your live application to catch quality issues, detect regressions, and ensure safety.

In the LangWatch platform, online evaluation is implemented through Monitors - automated rules that score incoming traces based on evaluators you configure.

How It Works

User Request → Your LLM → Response → LangWatch Trace → Monitor → Score
                                                          ↓
                                              Dashboard & Alerts

Your application sends traces to LangWatch (via SDK integration)
Monitors evaluate incoming traces using your configured evaluators
Scores are recorded and displayed on dashboards
Optionally trigger alerts when scores drop below thresholds

When to Use Online Evaluation

Use Case	Example
Quality monitoring	Track faithfulness, relevance, or custom quality metrics over time
Safety monitoring	Detect PII leakage, jailbreak attempts, or policy violations
Regression detection	Get alerts when quality metrics drop after deployments
Dataset building	Automatically add low-scoring traces to datasets for improvement

Monitors vs Guardrails

Both use evaluators, but serve different purposes:

Monitors	Guardrails
Measure quality asynchronously	Block harmful content in real-time
Run after the response is sent	Run before/during response generation
Feed dashboards and alerts	Return errors or safe responses to users
For observability	For enforcement

If you need to block harmful content before it reaches users, see Guardrails.

Getting Started

Set Up Monitors

Evaluation by Thread

Quick Setup

1. Ensure traces are being sent

First, make sure your application is sending traces to LangWatch:

Python
TypeScript

import langwatch

@langwatch.trace()
def my_llm_app(user_input):
    # Your LLM logic here
    return response

import { LangWatch } from "langwatch";

const langwatch = new LangWatch();
const trace = langwatch.getTrace();

// Your LLM logic here
trace.end();

2. Create a Monitor

Go to Evaluations in LangWatch
Click New Evaluation
Select Real-time evaluation (this creates a Monitor)
Choose “When a message arrives” as the trigger
Select evaluators (e.g., PII Detection, Faithfulness)
Configure any filters (optional)
Enable monitoring

3. View Results

Once enabled, scores will appear on:

Traces - Individual trace scores visible in trace details
Analytics - Aggregate metrics over time
Alerts - Configure automations for low scores

Adding Scores via Code

You can also add scores programmatically during request processing:

Python
TypeScript

import langwatch

@langwatch.trace()
def my_llm_app(user_input):
    response = generate_response(user_input)
    
    # Add a custom score
    langwatch.get_current_span().add_evaluation(
        name="response_quality",
        passed=True,
        score=0.95,
        details="High quality response"
    )
    
    return response

const trace = langwatch.getTrace();

// After generating response
trace.addEvaluation({
  name: "response_quality",
  passed: true,
  score: 0.95,
  details: "High quality response"
});

Get Started

Agent Simulations

Observability

Evaluations

Prompt Management

Platform

Examples & Cookbooks

Online Evaluation Overview

How It Works

When to Use Online Evaluation

Monitors vs Guardrails

Getting Started

Set Up Monitors

Evaluation by Thread

Quick Setup

1. Ensure traces are being sent

2. Create a Monitor

3. View Results

Adding Scores via Code

Available Evaluators

Next Steps

Set Up Monitors

Automations & Alerts

Guardrails

Evaluators

Get Started

Agent Simulations

Observability

Evaluations

Prompt Management

Platform

Examples & Cookbooks

​How It Works

​When to Use Online Evaluation

​Monitors vs Guardrails

​Getting Started

Set Up Monitors

Evaluation by Thread

​Quick Setup

​1. Ensure traces are being sent

​2. Create a Monitor

​3. View Results

​Adding Scores via Code

​Available Evaluators

​Next Steps

Set Up Monitors

Automations & Alerts

Guardrails

Evaluators

How It Works

When to Use Online Evaluation

Monitors vs Guardrails

Getting Started

Quick Setup

1. Ensure traces are being sent

2. Create a Monitor

3. View Results

Adding Scores via Code

Available Evaluators

Next Steps