pre (before dispatch), post (after a completed response), and stream_chunk (on every SSE chunk before emission).
Evaluators are the same ones you already use for online evaluations in LangWatch, no new authoring surface. PII detection, prompt injection, toxicity, hallucination, regex guards, custom code evaluators, every type you have attached to a project is selectable as a gateway guardrail.
Direction modes
pre, block/modify the outgoing request
Invoked before the gateway dispatches to the upstream provider. Decisions:
allow, dispatch as-is.block, return403 guardrail_blockedto the client immediately. No upstream call is made. No spend.modify, the evaluator returns a rewritten request payload (e.g. with PII redacted). The gateway dispatches the rewritten payload.
post, flag/block the response
Invoked on the full completed response for non-streaming calls, and on the reassembled stream for streaming calls. Wired on both /v1/chat/completions and /v1/messages.
Decisions:
allow, response delivered to client.block, for non-streaming, the response is replaced with403 guardrail_blockedbefore the client sees the assistant text, and a zero-costblocked_by_guardrailbudget debit is recorded so dashboards still see the attempt (cost = $0, the provider call was made and paid for, but the ledger marks it as non-billable to the principal). For streaming, the response has already been delivered; the decision becomes a flag-only (propagated via the evaluator’s own trace span, not a dedicated gateway attribute in v1).modify, the assistant text is rewritten in place (first choicecontenton/v1/chat/completions, first text block on/v1/messages); redaction is transparent from the client’s perspective. Streaming responses: flag-only.
Content-block responses skip post-evaluation. When the model returns a tool-call response (
tool_calls on OpenAI, content: [{type: "tool_use", ...}] on Anthropic) or an image/file block, there’s no assistant text to evaluate, the post-guardrail is skipped rather than attempting to reason about structured output. Use pre on the tool-call arguments or a dedicated content-aware guardrail if you need to gate tool calls.Fail-open vs fail-closed
By default, post-guardrails fail closed: if the evaluator service is unavailable or errors, the response is replaced with503 guardrail_upstream_unavailable, the user never sees an ungoverned response.
For VKs where an unavailable guardrail should not block (e.g. best-effort redaction on low-stakes traffic), set guardrails.response_fail_open: true on the VK config. The gateway then passes the response through with a warning log. (No dedicated fail_open span attribute is emitted in v1; the response’s error class plus the guardrail upstream’s own span is how operators trace this path today.)
pre guardrails use the same fail-closed default and the same VK opt-out (guardrails.request_fail_open).
stream_chunk, terminate per-chunk on visible deltas
Invoked on each SSE chunk with visible delta text before the gateway emits it to the client. Chunks without text, role-only frames ({"delta":{"role":"assistant"}}), tool-call frames, terminal usage frames, skip the guardrail call entirely. In practice this keeps ~95% of stream frames at pass-through cost.
Decisions:
allow, emit chunk.block, terminate the stream with a terminalevent: error:The channel is closed immediately after; subsequent upstream chunks are discarded. Same wire shape as a provider-failure terminator, distinguishable byerror.code(stream_chunk_blockedvsprovider_error). See Streaming → Mid-stream error shapes.modify, not implemented in v1. Chunk-level content rewriting is provider-shape-specific (OpenAI delta JSON vs Anthropic SSE events have different shapes). For v1, “redact on stream” = block and let the client retry without the offending input. A future iter may add provider-aware chunk rewriting once a real customer asks.
gateway_guardrail_verdicts_total{direction="stream_chunk",verdict="fail_open"}. This is an explicit contract decision: blocking the user’s stream on a slow policy service is worse than occasional pass-through, but the metric makes slow/flaky services visible before they become reliability problems.
Use cases: real-time PII termination (credit card numbers, emails, phone numbers) where the correct behaviour on detection is “stop sending and the user retries.” For best-effort redaction that doesn’t interrupt the stream, use post on the reassembled response instead.
Attaching guardrails to a VK
The VK create/edit drawer has three direction sections, Pre-request, Post-response, Stream chunk: each listing every project evaluator whoseexecutionMode = AS_GUARDRAIL. Check the box next to an evaluator to attach it to that direction; uncheck to detach. Save the drawer, the gateway picks up the change within 30 s via the /changes long-poll.
Each direction section also has a Default-block on evaluator failure toggle. Off (default) = fail closed (matches guardrails.request_fail_open: false, response_fail_open: false). On = fail open; the gateway WARN-logs the evaluator failure and proceeds. The helper copy cites the concrete enforcement shapes so you know what the toggle actually does:
- Pre: blocks return
403 guardrail_blockedwith a zero-cost debit. - Post: non-stream blocks replace the response with 403; stream responses flag only (the bytes are already out).
- Stream chunk: blocks emit a byte-locked terminal SSE error with
code=stream_chunk_blocked; the 50 ms per-chunk budget fails OPEN by contract regardless of the toggle (stream-chunk fail-open is a performance invariant, not a policy choice).
Execution mode = As guardrail. That’s the same evaluator-authoring surface you already use for online evaluations; there’s no separate gateway-evaluator editor.
Legacy surface, REST or CLI, still works for scripting:
{id, evaluator} tuple shape into the bundle (contract §4.2), so you can flip between surfaces without resetting configuration.
A VK can have multiple guardrails in each direction; the gateway runs them in parallel and short-circuits on the first block decision.
Running them in parallel
Dispatch. The gateway fans out the guardrail calls in a direction in parallel, bounded byMAX_GUARDRAIL_CONCURRENCY (default 8). All calls start at the same time; the dispatcher waits for the slowest verdict, with the early-exit rule below.
Block decisions. As soon as any guardrail returns block, the request short-circuits: still-in-flight guardrails are cancelled and the gateway returns the block response without waiting. First block wins; later guardrails never run.
Modify decisions. When more than one guardrail in the same direction returns modify, the dispatcher applies them sequentially in config.guardrails[direction] array order (the order you set on the VK). Each modify rewrites the payload and the next modify sees the rewritten version. Modifies do NOT chain across directions: request modifies are applied before dispatch, response modifies are applied after the upstream response returns.
Stream chunks. stream_chunk guardrails currently run in block-only mode in v1: a modify verdict on a stream chunk is accepted by the contract but the dispatcher rewrites are not yet wired (tracked alongside the metric-emission gap below, same hot-path). Use request/response modify for now if you need payload rewrites; stream_chunk is best used for block (e.g. cut a stream the moment a regex match appears on the way out).
Observability
Prometheus countergateway_guardrail_verdicts_total{direction, verdict} records every verdict the guardrail pipeline returns. Labels:
direction,request|response|stream_chunkverdict,allow|block|modify|fail_open
v1 wiring caveat: in the Go data plane, only the
stream_chunk direction currently emits this metric from the dispatcher hot path. The request and response directions run the guardrails correctly (block, modify logic works), but the dispatcher doesn’t yet call Metrics.RecordGuardrailVerdict at those sites. Tracked as a v1 follow-up (finding #17); operators who need per-direction allow-rate dashboards today can proxy from gateway_http_requests_total{status="403"} which flips on every block verdict regardless of direction.langwatch.guardrail.verdict as a canonical span attribute for the aggregate verdict, but it is not yet emitted from the dispatcher in v1. Per-guardrail decision detail is visible by clicking through to the evaluator’s run in the LangWatch Messages view, the evaluator’s own trace carries the policies triggered + reasoning.
Permissions
gatewayGuardrails:attach, attach a guardrail to a VK.gatewayGuardrails:detach, remove one.- Evaluator CRUD uses the existing
evaluations:*permissions (unchanged).
Cost
Every guardrail call is a separate LangWatch evaluator run. It’s metered against your LangWatch plan the same as any other online-evaluation run.Blocking vs modifying: when to pick which
- Block when the request/response is unfixable and the only correct behaviour is to fail fast. Example: prompt-injection detected in a user query, never forward.
- Modify when there’s a safe redacted version the user should still see. Example: PII in input → redact with
[REDACTED]markers and forward the rest. - Stream_chunk modify when real-time is essential. Example: the model is emitting a credit-card number mid-stream; redact before the user sees it.
allow + flag (just log without blocking) while tuning a guardrail’s thresholds. Promote to block or modify once the false-positive rate is low.
Further reading
- Policy Rules for simpler regex-based denials (tools, MCPs, URLs, models) that don’t need an evaluator.
- Streaming for the full SSE contract.