Cache-control rules let operators modulate cache behaviour across the gateway fleet with zero client code changes. This cookbook walks through three common rollout scenarios end-to-end: authoring a rule, verifying propagation, inspecting the observability signals, and rolling back. All three scenarios share the same feedback loop:Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Author the rule (UI, CLI, or REST API — pick whichever fits the workflow)
- Wait ≤ 30 seconds for the gateway fleet’s
/changeslong-poll to refresh VK bundles - Exercise the rule with a real request
- Verify via OTel span attributes + Prometheus counter
- Archive when the rule has served its purpose
Scenario 1 — Force Anthropic cache_control for enterprise tier
You’ve tagged enterprise-tier VKs withtier=enterprise. You want every Anthropic request from those VKs to carry cache_control: {type: "ephemeral"} on the last system/user block, so large system prompts cost 3/1M on cache hits.
v1 scope —
force mode injects cache_control: {type: "ephemeral"} on Anthropic-shape bodies (/v1/messages) into system[-1] and messages[-1].content[-1], preserves any client-set cache_control (no double-inject), and is a wire no-op on OpenAI-shape (/v1/chat/completions — OpenAI’s caching is automatic, but the rule still attributes + counts). Gemini force still surfaces as WARN + passthrough because the /cachedContents pre-POST path breaks zero-hop routing — tracked as v1.1 follow-up.Author the rule
Pick one path — all three land the same row via the same service layer. UI (Platform → AI Gateway → Cache control → New rule):- Name:
force-cache-enterprise-anthropic - Priority:
300 - Match → VK tags:
tier=enterprise - Match → Model (optional):
claude-* - Action:
force, TTL:600
Wait for propagation
The gateway fleet long-polls/api/internal/gateway/changes on a 30-second tick. You don’t need to restart pods. Verify with:
Exercise
Send any request through an enterprise-tagged VK:Verify
The response headers:cache_control: {type: "ephemeral"} on system[-1] (or messages[-1].content[-1] if the last user message has array content). Anthropic serves the cached prefix on the next matching request at 10% input-token cost.
The trace span on LangWatch:
/v1/chat/completions (OpenAI), the rule still fires + attributes + bumps the counter, but the forwarded body is byte-identical — OpenAI’s caching is automatic. Operator dashboard query: gateway_cache_rule_hits_total by (rule_id, mode_applied) gives per-rule firing rates; pair with provider-reported cache_read_input_tokens in the trace span to verify cache hits on the response side.
Scenario 2 — Disable cache for evaluation traffic
Your eval suite uses dedicated VKs prefixedlw_vk_eval_* and tags every request with X-Langwatch-Suite: evals. You want these to hit provider caches as rarely as possible — fresh completions keep eval scores honest.
Author
Exercise
Verify
cache_control key stripped (recursive JSON walk — Anthropic shape) or cachedContent removed (Gemini shape), so the provider can’t serve a cache hit.
Scenario 3 — Cache-bust after a prompt template change
You’ve rolled out a new system prompt template. Existing cache entries now point at the old wording. You want every enterprise-tier VK to regenerate its cache on next request. Update the rule’ssalt rather than archive+recreate — this preserves the audit trail and the rule id (so dashboards don’t break).
rule_id + mode_applied attributes record the intent so dashboards can correlate the rollout.
Rollback
Archiving a rule stops it matching new requests. History is preserved in the audit log + Prometheus counter.200 + the archived row (not 204), so scripts can confirm the archived_at timestamp before moving on. The gateway fleet picks up the archive on the next /changes tick (≤30 s) and stops firing the rule.
Gotchas
- At least one matcher is required. Rules with no matchers that would match every request are unsupported in v1 — that’s the per-VK
cache.modedefault’s job. Author an explicit matcher instead. matchersandactionare REPLACED on PATCH. If youPATCH /cache-rules/:idwith amatchersobject, the stored value is replaced wholesale. To add a new matcher, include the existing ones too.modelmatchers fire on both endpoints. Earlier iterations had a caveat thatmatchers.modeldidn’t fire on/v1/messagesdue to pipeline ordering — that was fixed via a cheap single-field JSON peek (extractModelField, ~150 ns, zero-alloc on no-rules path). Rules likematchers.model = "claude-haiku-*"now match on both/v1/chat/completionsand/v1/messages.- Per-request header always wins. A client that sends
X-LangWatch-Cache: disablebypasses every rule. This is intentional — it’s the per-request escape hatch for repros and cold-cache benchmarks. - Rules are organization-scoped. They apply to every VK in the organization, not just the project whose API key you used to author them. Scope a rule narrowly with
matchers.vk_id/vk_tags/vk_prefixif you want project- or environment-specific behaviour.
See also
- Cache control — full per-provider semantics + precedence
- CLI reference —
langwatch cache-rulessubcommand - Management REST API — 5-route wire contract
- RBAC —
gatewayCacheRules:*permission matrix