Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Cache-control rules let operators modulate cache behaviour across the gateway fleet with zero client code changes. This cookbook walks through three common rollout scenarios end-to-end: authoring a rule, verifying propagation, inspecting the observability signals, and rolling back. All three scenarios share the same feedback loop:
  1. Author the rule (UI, CLI, or REST API — pick whichever fits the workflow)
  2. Wait ≤ 30 seconds for the gateway fleet’s /changes long-poll to refresh VK bundles
  3. Exercise the rule with a real request
  4. Verify via OTel span attributes + Prometheus counter
  5. Archive when the rule has served its purpose

Scenario 1 — Force Anthropic cache_control for enterprise tier

You’ve tagged enterprise-tier VKs with tier=enterprise. You want every Anthropic request from those VKs to carry cache_control: {type: "ephemeral"} on the last system/user block, so large system prompts cost 1/1Minputtokensinsteadof1/1M input tokens instead of 3/1M on cache hits.
v1 scope — force mode injects cache_control: {type: "ephemeral"} on Anthropic-shape bodies (/v1/messages) into system[-1] and messages[-1].content[-1], preserves any client-set cache_control (no double-inject), and is a wire no-op on OpenAI-shape (/v1/chat/completions — OpenAI’s caching is automatic, but the rule still attributes + counts). Gemini force still surfaces as WARN + passthrough because the /cachedContents pre-POST path breaks zero-hop routing — tracked as v1.1 follow-up.

Author the rule

Pick one path — all three land the same row via the same service layer. UI (Platform → AI Gateway → Cache control → New rule):
  • Name: force-cache-enterprise-anthropic
  • Priority: 300
  • Match → VK tags: tier=enterprise
  • Match → Model (optional): claude-*
  • Action: force, TTL: 600
CLI:
langwatch cache-rules create \
  --name force-cache-enterprise-anthropic \
  --priority 300 \
  --mode force --ttl 600 \
  --match-tag tier=enterprise \
  --match-model "claude-*"
REST:
curl -X POST https://app.langwatch.ai/api/gateway/v1/cache-rules \
  -H "Authorization: Bearer $LANGWATCH_PAT" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "force-cache-enterprise-anthropic",
    "priority": 300,
    "matchers": {
      "vk_tags": ["tier=enterprise"],
      "model": "claude-*"
    },
    "action": { "mode": "force", "ttl": 600 }
  }'

Wait for propagation

The gateway fleet long-polls /api/internal/gateway/changes on a 30-second tick. You don’t need to restart pods. Verify with:
# From any langwatch-app pod (control plane side)
curl -sS "$LW_INTERNAL_URL/api/internal/gateway/changes?since=0&limit=5" \
  -H "X-LangWatch-Gateway-Signature: $(gw_sign)" | jq '.events[] | select(.kind | startswith("CACHE_RULE"))'

Exercise

Send any request through an enterprise-tagged VK:
curl https://gateway.langwatch.ai/v1/messages \
  -H "Authorization: Bearer lw_vk_live_ENT_01HZX9..." \
  -H "x-api-key: $LANGWATCH_ANTHROPIC_HEADER" \
  -d '{ "model": "claude-haiku-4-5-20251001", "max_tokens": 128, "messages": [...] }'

Verify

The response headers:
X-LangWatch-Cache-Mode: force
The forwarded request body now has cache_control: {type: "ephemeral"} on system[-1] (or messages[-1].content[-1] if the last user message has array content). Anthropic serves the cached prefix on the next matching request at 10% input-token cost. The trace span on LangWatch:
langwatch.cache.rule_id: V1StGXR8_…          # the rule matched
langwatch.cache.mode_applied: FORCE          # rule intent applied to the wire
Prometheus:
gateway_cache_rule_hits_total{rule_id="V1StGXR8_…", mode_applied="FORCE"} 1
On /v1/chat/completions (OpenAI), the rule still fires + attributes + bumps the counter, but the forwarded body is byte-identical — OpenAI’s caching is automatic. Operator dashboard query: gateway_cache_rule_hits_total by (rule_id, mode_applied) gives per-rule firing rates; pair with provider-reported cache_read_input_tokens in the trace span to verify cache hits on the response side.

Scenario 2 — Disable cache for evaluation traffic

Your eval suite uses dedicated VKs prefixed lw_vk_eval_* and tags every request with X-Langwatch-Suite: evals. You want these to hit provider caches as rarely as possible — fresh completions keep eval scores honest.

Author

langwatch cache-rules create \
  --name disable-cache-evals \
  --priority 200 \
  --mode disable \
  --match-vk-prefix lw_vk_eval_ \
  --match-metadata X-Langwatch-Suite=evals
Matchers are ANDed — this rule only fires when BOTH the VK prefix and the metadata header match. Other eval traffic from non-prefixed VKs stays on the per-VK default.

Exercise

curl https://gateway.langwatch.ai/v1/messages \
  -H "Authorization: Bearer lw_vk_eval_01HZ..." \
  -H "X-Langwatch-Suite: evals" \
  -d '{ "model": "claude-haiku-4-5-20251001", ... "system": [...], "messages": [...] }'

Verify

X-LangWatch-Cache-Mode: disable
Trace attrs:
langwatch.cache.rule_id:      r_evals_disable
langwatch.cache.mode_applied: DISABLE
The forwarded request body has every cache_control key stripped (recursive JSON walk — Anthropic shape) or cachedContent removed (Gemini shape), so the provider can’t serve a cache hit.

Scenario 3 — Cache-bust after a prompt template change

You’ve rolled out a new system prompt template. Existing cache entries now point at the old wording. You want every enterprise-tier VK to regenerate its cache on next request. Update the rule’s salt rather than archive+recreate — this preserves the audit trail and the rule id (so dashboards don’t break).
langwatch cache-rules update <rule_id> --salt "2026Q2-template-rollout"
Any cache the provider or downstream layer keys by the rule’s salt will miss on the next hit and be populated fresh. For providers without explicit TTL / salt support, this is a no-op on the wire but the rule_id + mode_applied attributes record the intent so dashboards can correlate the rollout.

Rollback

Archiving a rule stops it matching new requests. History is preserved in the audit log + Prometheus counter.
# UI: Cache control → … → Archive
# CLI:
langwatch cache-rules archive <rule_id>
# REST:
curl -X DELETE https://app.langwatch.ai/api/gateway/v1/cache-rules/<rule_id> \
  -H "Authorization: Bearer $LANGWATCH_PAT"
DELETE returns 200 + the archived row (not 204), so scripts can confirm the archived_at timestamp before moving on. The gateway fleet picks up the archive on the next /changes tick (≤30 s) and stops firing the rule.

Gotchas

  • At least one matcher is required. Rules with no matchers that would match every request are unsupported in v1 — that’s the per-VK cache.mode default’s job. Author an explicit matcher instead.
  • matchers and action are REPLACED on PATCH. If you PATCH /cache-rules/:id with a matchers object, the stored value is replaced wholesale. To add a new matcher, include the existing ones too.
  • model matchers fire on both endpoints. Earlier iterations had a caveat that matchers.model didn’t fire on /v1/messages due to pipeline ordering — that was fixed via a cheap single-field JSON peek (extractModelField, ~150 ns, zero-alloc on no-rules path). Rules like matchers.model = "claude-haiku-*" now match on both /v1/chat/completions and /v1/messages.
  • Per-request header always wins. A client that sends X-LangWatch-Cache: disable bypasses every rule. This is intentional — it’s the per-request escape hatch for repros and cold-cache benchmarks.
  • Rules are organization-scoped. They apply to every VK in the organization, not just the project whose API key you used to author them. Scope a rule narrowly with matchers.vk_id / vk_tags / vk_prefix if you want project- or environment-specific behaviour.

See also