Guardrails are evaluators that run in real-time and act on the results - blocking, modifying, or rejecting responses that violate your safety or policy rules. Unlike monitors which only measure and alert, guardrails actively prevent harmful content from reaching users.
Guardrails vs Monitors
| Guardrails | Monitors |
|---|
| Block harmful content | Measure quality metrics |
| Run synchronously during request | Run asynchronously after response |
| Return errors or safe responses | Feed dashboards and alerts |
| Add latency to requests | No impact on response time |
| For enforcement | For observability |
Use guardrails when you need to prevent something from happening. Use monitors when you need to observe what’s happening.
Common Guardrail Use Cases
| Use Case | Evaluator | Action |
|---|
| Block jailbreak attempts | Azure Jailbreak Detection | Reject input |
| Prevent PII exposure | Presidio PII Detection | Block or redact response |
| Enforce content policy | OpenAI Moderation | Return safe response |
| Block competitor mentions | Competitor Blocklist | Modify or reject |
| Ensure valid output format | Valid Format Evaluator | Retry or reject |
How Guardrails Work
User Input → Guardrail Check → [Pass] → LLM → Response → Guardrail Check → [Pass] → User
↓ ↓
[Fail] → Return Error [Fail] → Return Safe Response
Guardrails can run at two points:
- Input guardrails - Check user input before calling your LLM
- Output guardrails - Check LLM response before sending to user
Getting Started
Quick Example
import langwatch
@langwatch.trace()
def my_chatbot(user_input):
# Input guardrail - check for jailbreak attempts
jailbreak_check = langwatch.evaluation.evaluate(
"azure/jailbreak",
name="Jailbreak Detection",
as_guardrail=True,
data={"input": user_input},
)
if not jailbreak_check.passed:
return "I'm sorry, I can't help with that request."
# Generate response
response = call_llm(user_input)
# Output guardrail - check for PII
pii_check = langwatch.evaluation.evaluate(
"presidio/pii_detection",
name="PII Check",
as_guardrail=True,
data={"output": response},
)
if not pii_check.passed:
return "I apologize, but I cannot share that information."
return response
import { LangWatch } from "langwatch";
const langwatch = new LangWatch();
async function myChatbot(userInput: string): Promise<string> {
// Input guardrail - check for jailbreak attempts
const jailbreakCheck = await langwatch.evaluations.evaluate("azure/jailbreak", {
name: "Jailbreak Detection",
asGuardrail: true,
data: { input: userInput },
});
if (!jailbreakCheck.passed) {
return "I'm sorry, I can't help with that request.";
}
// Generate response
const response = await callLLM(userInput);
// Output guardrail - check for PII
const piiCheck = await langwatch.evaluations.evaluate("presidio/pii_detection", {
name: "PII Check",
asGuardrail: true,
data: { output: response },
});
if (!piiCheck.passed) {
return "I apologize, but I cannot share that information.";
}
return response;
}
Best Practices
1. Layer your guardrails
Use multiple guardrails for defense in depth:
# Layer 1: Block malicious input
jailbreak = evaluate("azure/jailbreak", as_guardrail=True, input=user_input)
# Layer 2: Content moderation
moderation = evaluate("openai/moderation", as_guardrail=True, input=user_input)
# Layer 3: Check output before sending
pii = evaluate("presidio/pii_detection", as_guardrail=True, output=response)
2. Provide helpful error messages
Don’t just block - guide users toward acceptable behavior:
if not guardrail.passed:
if guardrail.details:
return f"I can't help with that because: {guardrail.details}"
return "I'm not able to assist with that request. Could you rephrase?"
3. Log guardrail triggers
Track when guardrails fire for monitoring and improvement:
if not guardrail.passed:
langwatch.get_current_trace().update(
metadata={"guardrail_triggered": guardrail.name}
)
4. Consider latency
Guardrails add latency. For time-sensitive applications:
- Use fast evaluators (regex, blocklists) for input checks
- Save heavier evaluators (LLM-based) for output checks
- Run multiple guardrails in parallel when possible
Recommended Evaluators for Guardrails
| Evaluator | Best For | Latency |
|---|
| Azure Jailbreak Detection | Blocking prompt injection | Fast |
| Azure Prompt Shield | Blocking prompt attacks | Fast |
| Presidio PII Detection | Blocking PII exposure | Fast |
| OpenAI Moderation | Content policy enforcement | Fast |
| Competitor Blocklist | Blocking competitor mentions | Very Fast |
| Valid Format | Ensuring structured output | Very Fast |
| LLM-as-Judge Boolean | Custom policy checks | Slower |
Next Steps