Guardrails Overview

Let your agent set this up. Copy the evaluations prompt into your coding agent to get started automatically.

Guardrails are evaluators that run in real-time and act on the results - blocking, modifying, or rejecting responses that violate your safety or policy rules. Unlike monitors which only measure and alert, guardrails actively prevent harmful content from reaching users.

Guardrails vs Monitors

Guardrails	Monitors
Block harmful content	Measure quality metrics
Run synchronously during request	Run asynchronously after response
Return errors or safe responses	Feed dashboards and alerts
Add latency to requests	No impact on response time
For enforcement	For observability

Use guardrails when you need to prevent something from happening. Use monitors when you need to observe what’s happening.

Common Guardrail Use Cases

Use Case	Evaluator	Action
Block jailbreak attempts	Azure Jailbreak Detection	Reject input
Prevent PII exposure	Presidio PII Detection	Block or redact response
Enforce content policy	OpenAI Moderation	Return safe response
Block competitor mentions	Competitor Blocklist	Modify or reject
Ensure valid output format	Valid Format Evaluator	Retry or reject

How Guardrails Work

User Input → Guardrail Check → [Pass] → LLM → Response → Guardrail Check → [Pass] → User
                    ↓                                           ↓
               [Fail] → Return Error                     [Fail] → Return Safe Response

Guardrails can run at two points:

Input guardrails - Check user input before calling your LLM
Output guardrails - Check LLM response before sending to user

Getting Started

Code Integration

Available Evaluators

Quick Example

Python
TypeScript

import langwatch

@langwatch.trace()
def my_chatbot(user_input):
    # Input guardrail - check for jailbreak attempts
    jailbreak_check = langwatch.evaluation.evaluate(
        "azure/jailbreak",
        name="Jailbreak Detection",
        as_guardrail=True,
        data={"input": user_input},
    )
    
    if not jailbreak_check.passed:
        return "I'm sorry, I can't help with that request."
    
    # Generate response
    response = call_llm(user_input)
    
    # Output guardrail - check for PII
    pii_check = langwatch.evaluation.evaluate(
        "presidio/pii_detection",
        name="PII Check",
        as_guardrail=True,
        data={"output": response},
    )
    
    if not pii_check.passed:
        return "I apologize, but I cannot share that information."
    
    return response

import { LangWatch } from "langwatch";

const langwatch = new LangWatch();

async function myChatbot(userInput: string): Promise<string> {
  // Input guardrail - check for jailbreak attempts
  const jailbreakCheck = await langwatch.evaluations.evaluate("azure/jailbreak", {
    name: "Jailbreak Detection",
    asGuardrail: true,
    data: { input: userInput },
  });
  
  if (!jailbreakCheck.passed) {
    return "I'm sorry, I can't help with that request.";
  }
  
  // Generate response
  const response = await callLLM(userInput);
  
  // Output guardrail - check for PII
  const piiCheck = await langwatch.evaluations.evaluate("presidio/pii_detection", {
    name: "PII Check",
    asGuardrail: true,
    data: { output: response },
  });
  
  if (!piiCheck.passed) {
    return "I apologize, but I cannot share that information.";
  }
  
  return response;
}

Best Practices

1. Layer your guardrails

Use multiple guardrails for defense in depth:

# Layer 1: Block malicious input
jailbreak = evaluate("azure/jailbreak", as_guardrail=True, input=user_input)

# Layer 2: Content moderation
moderation = evaluate("openai/moderation", as_guardrail=True, input=user_input)

# Layer 3: Check output before sending
pii = evaluate("presidio/pii_detection", as_guardrail=True, output=response)

2. Provide helpful error messages

Don’t just block - guide users toward acceptable behavior:

if not guardrail.passed:
    if guardrail.details:
        return f"I can't help with that because: {guardrail.details}"
    return "I'm not able to assist with that request. Could you rephrase?"

3. Log guardrail triggers

Track when guardrails fire for monitoring and improvement:

if not guardrail.passed:
    langwatch.get_current_trace().update(
        metadata={"guardrail_triggered": guardrail.name}
    )

4. Consider latency

Guardrails add latency. For time-sensitive applications:

Use fast evaluators (regex, blocklists) for input checks
Save heavier evaluators (LLM-based) for output checks
Run multiple guardrails in parallel when possible

Recommended Evaluators for Guardrails

Evaluator	Best For	Latency
Azure Jailbreak Detection	Blocking prompt injection	Fast
Azure Prompt Shield	Blocking prompt attacks	Fast
Presidio PII Detection	Blocking PII exposure	Fast
OpenAI Moderation	Content policy enforcement	Fast
Competitor Blocklist	Blocking competitor mentions	Very Fast
Valid Format	Ensuring structured output	Very Fast
LLM-as-Judge Boolean	Custom policy checks	Slower

Get Started

Agent Simulations

Observability

Evaluations

Prompt Management

Platform

Examples & Cookbooks

Guardrails Overview

Guardrails vs Monitors

Common Guardrail Use Cases

How Guardrails Work

Getting Started

Code Integration

Available Evaluators

Quick Example

Best Practices

1. Layer your guardrails

2. Provide helpful error messages

3. Log guardrail triggers

4. Consider latency

Recommended Evaluators for Guardrails

Next Steps

Code Integration

Evaluators List

Online Evaluation

Python Integration

Get Started

Agent Simulations

Observability

Evaluations

Prompt Management

Platform

Examples & Cookbooks

​Guardrails vs Monitors

​Common Guardrail Use Cases

​How Guardrails Work

​Getting Started

Code Integration

Available Evaluators

​Quick Example

​Best Practices

​1. Layer your guardrails

​2. Provide helpful error messages

​3. Log guardrail triggers

​4. Consider latency

​Recommended Evaluators for Guardrails

​Next Steps

Code Integration

Evaluators List

Online Evaluation

Python Integration

Guardrails vs Monitors

Common Guardrail Use Cases

How Guardrails Work

Getting Started

Quick Example

Best Practices

1. Layer your guardrails

2. Provide helpful error messages

3. Log guardrail triggers

4. Consider latency

Recommended Evaluators for Guardrails

Next Steps