Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
The gateway exposes a /metrics endpoint scrapable by any standard Prometheus setup. This cookbook is a drop-in set of alert rules that cover what you actually need to page on: provider outages, budget bypass, auth cache degradation, streaming bugs, and cost anomalies.
Copy this into alerts/langwatch-gateway.yml and prometheus reload or kubectl rollout restart your Prometheus Operator-managed PrometheusRule CR.
The ruleset
groups:
- name: langwatch-gateway
interval: 30s
rules:
# ─── Reliability ─────────────────────────────────────────────────────
- alert: GatewayHighErrorRate
expr: |
sum(rate(gateway_http_requests_total{status=~"5.."}[5m]))
/ sum(rate(gateway_http_requests_total[5m])) > 0.05
for: 5m
labels: { severity: page, team: ai-platform }
annotations:
summary: "Gateway 5xx rate > 5% over 5m"
description: |
End-user requests to the gateway are returning 5xx at {{ printf "%.1f" (mulScalar $value 100) }}%.
Check /readyz on pods, then upstream provider status.
- alert: GatewayReadinessFlapping
expr: |
changes(kube_pod_status_ready{
condition="true",
namespace="langwatch",
pod=~"langwatch-gateway-.*"
}[10m]) > 4
for: 5m
labels: { severity: warn, team: ai-platform }
annotations:
summary: "Gateway pod {{ $labels.pod }} readiness flapped > 4 times in 10m"
description: "Pod is flipping ready/not-ready. Investigate /readyz output."
# ─── Provider health ─────────────────────────────────────────────────
- alert: CircuitOpenTooLong
expr: gateway_circuit_state == 2
for: 5m
labels: { severity: page, team: ai-platform }
annotations:
summary: "Circuit open for credential {{ $labels.credential_id }} > 5m"
description: |
Gateway has tripped the breaker and is skipping this provider.
Verify the provider is actually having issues (check X-LangWatch-
Provider headers on recent failures + provider status page).
- alert: ExcessiveFallback
expr: |
sum(rate(gateway_provider_attempts_total{outcome="fallback"}[10m]))
/ sum(rate(gateway_provider_attempts_total[10m])) > 0.1
for: 10m
labels: { severity: warn, team: ai-platform }
annotations:
summary: "> 10% of requests falling back over 10m"
description: |
Primary provider is degraded enough that > 10% of requests are
walking the fallback chain. Not paging because fallback is
working as designed, but the primary needs investigation.
# ─── Budget governance ───────────────────────────────────────────────
- alert: StreamingUsageMissing
expr: |
sum(rate(gateway_streaming_usage_missing_total[10m])) > 0
for: 10m
labels: { severity: page, team: ai-platform }
annotations:
summary: "Streaming requests without usage reported"
description: |
Streaming request completing without token counts means $0 is
being debited to the budget — budgets are silently bypassed.
OpenAI requires stream_options.include_usage=true on the client.
See /ai-gateway/streaming#usage-extraction-critical-for-streaming-budgets.
- alert: BudgetDebitOutboxBacklog
expr: gateway_budget_debit_outbox_depth > 1000
for: 5m
labels: { severity: page, team: ai-platform }
annotations:
summary: "Budget debit outbox depth > 1000 for 5m"
description: |
Gateway can't reach the control plane /budget/debit endpoint fast
enough. Customer budgets are NOT being debited in real-time;
near-limit customers may be over-consuming. Check control-plane
health + gateway_budget_debit_outbox_dropped_total rate.
# --- Iter 21 outbox leading-indicator rules
# Fill-pct (depth/capacity) leads the absolute-depth rule above and
# catches pods with different capacities consistently.
- alert: BudgetOutboxFillPctHigh
expr: |
max by (pod) (gateway_budget_debit_outbox_depth)
/ max by (pod) (gateway_budget_debit_outbox_capacity) > 0.5
for: 5m
labels: { severity: warn, team: ai-platform }
annotations:
summary: "Outbox fill-pct > 50% on {{ $labels.pod }} for 5m"
description: |
Approaching capacity. See production runbook Recipe 6 — normal
bursts self-heal; sustained climb is usually flush failure.
# Flush-failure rate leads depth climb — catches slow/unreachable
# control plane BEFORE events back up enough to alert on depth.
- alert: BudgetOutboxFlushFailures
expr: rate(gateway_budget_debit_outbox_flush_failures_total[5m]) > 0
for: 5m
labels: { severity: warn, team: ai-platform }
annotations:
summary: "Outbox flush failures for 5m on {{ $labels.pod }}"
description: |
Control plane is slow/unreachable; debits are re-enqueued and
safe, but latency to reconcile is growing. Investigate control-
plane /budget/debit latency before this turns into a depth alert.
# 4xx drops = silent data loss. Any non-zero rate is immediately
# actionable (signing drift / payload drift / control-plane bug).
- alert: BudgetOutbox4xxDrops
expr: increase(gateway_budget_debit_outbox_4xx_drops_total[15m]) > 0
labels: { severity: page, team: ai-platform }
annotations:
summary: "Outbox debits are being terminally dropped with 4xx"
description: |
Silent data loss. Debits are being permanently rejected by the
control plane. Most common: LW_GATEWAY_INTERNAL_SECRET drift
after rotation. See production runbook Recipe 6.
- alert: BudgetCheckLiveFailRate
expr: |
sum(rate(gateway_budget_check_live_total{outcome="transport_error"}[5m]))
/ sum(rate(gateway_budget_check_live_total[5m])) > 0.2
for: 5m
labels: { severity: warn, team: ai-platform }
annotations:
summary: "Live /budget/check failing open > 20%"
description: |
Live reconciliation for near-limit scopes is timing out or erroring
and falling back to the cached snapshot. Near-limit customers may
briefly over-consume. Check control-plane /budget/check latency.
# ─── Auth cache ──────────────────────────────────────────────────────
- alert: AuthCacheHitRateDropped
expr: |
sum(rate(gateway_auth_cache_hits_total{tier="l1"}[5m]))
/ ( sum(rate(gateway_auth_cache_hits_total{tier="l1"}[5m]))
+ sum(rate(gateway_auth_cache_misses_total{tier="l1"}[5m])) ) < 0.9
for: 15m
labels: { severity: warn, team: ai-platform }
annotations:
summary: "L1 auth cache hit rate < 90% for 15m"
description: |
Sustained cache-miss rate means every miss costs a /resolve-key
round-trip to the control plane. Investigate: recent deploy churn,
/changes feed reliability, or LRU eviction under load.
# ─── Cost & anomaly ──────────────────────────────────────────────────
# Note: per-request cost anomaly detection requires a cost metric
# (e.g. gateway_cost_usd_total) that isn't in the current collector set.
# It's easier to derive cost anomalies from the GatewayBudgetLedger
# table in the control-plane warehouse rather than Prometheus, since
# cost attribution happens on the control-plane side after debit.
# See /gateway/usage UI for the visual equivalent.
# ─── Blocked-by-policy noise ────────────────────────────────────────
- alert: GuardrailBlockSpike
expr: |
sum by (direction) (
rate(gateway_guardrail_verdicts_total{verdict="block"}[5m])
) > 10
for: 5m
labels: { severity: warn, team: ai-platform }
annotations:
summary: "> 10 req/s blocked by policy reason={{ $labels.reason }}"
description: |
Legit block storm (a customer's runtime is trying a banned tool
repeatedly) or a regression in the policy_rules config.
Check LangWatch traces filtered on
attr.langwatch.policy.blocked != "".
Why these and not others
The temptation with Prometheus rules is to alert on everything. Don’t. Each rule above either:
- Represents real impact (5xx rate, circuit open, usage missing) — customers or budgets are actively affected.
- Is a leading indicator of incoming impact (excessive fallback, L1 cache drop, outbox stale) — things get worse soon if ignored.
- Is a cost guardrail (streaming usage missing, cost anomaly) — silent over-spend is a real failure mode for AI infra.
Rules we explicitly do NOT have:
- Individual provider 429s — these are normal operating state; fallback + circuit handle them without a human.
- High latency — the gateway’s added latency is bounded; if a provider is slow, alerting on it is alerting on the provider.
- Cache miss rate on upstream — cache is passthrough; if Anthropic doesn’t hit cache that’s not the gateway’s problem.
Route severity: page to PagerDuty, severity: warn to Slack. Example Alertmanager config:
route:
receiver: slack-gateway
group_by: [alertname, team]
routes:
- matchers: [severity="page", team="ai-platform"]
receiver: pagerduty-ai-platform
continue: true
- matchers: [severity="warn", team="ai-platform"]
receiver: slack-gateway
receivers:
- name: pagerduty-ai-platform
pagerduty_configs:
- routing_key: "<pager-duty-service-key>"
- name: slack-gateway
slack_configs:
- channel: "#ai-gateway-alerts"
title: "{{ .GroupLabels.alertname }}"
text: "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}"
Verifying rules before deploy
promtool check rules alerts/langwatch-gateway.yml
promtool test rules alerts/langwatch-gateway.test.yml
Example unit test:
rule_files:
- alerts/langwatch-gateway.yml
evaluation_interval: 1m
tests:
- interval: 1m
input_series:
- series: 'gateway_http_requests_total{status="500"}'
values: '0 1 2 3 4 5 6 7 8 9 10'
- series: 'gateway_http_requests_total{status="200"}'
values: '0 0 0 0 0 0 0 0 0 0 0'
alert_rule_test:
- eval_time: 10m
alertname: GatewayHighErrorRate
exp_alerts:
- exp_labels: { severity: page, team: ai-platform }
Grafana dashboard
A matching Grafana dashboard JSON is at /ai-gateway/self-hosting/helm#monitoring. Panels mirror the alert rules so you can visually see what’s about to fire.
See also