A single dashboard with every panel you actually look at during an incident. Pairs with Prometheus alerts and Production runbook — alerts page you, this dashboard tells you what is wrong, the runbook tells you how to dig.Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
What it covers
Eight rows, top to bottom:| Row | Panels | Why |
|---|---|---|
| Request health | rate, p50/p95/p99, error % | primary SLI row; everything else is causal |
| Provider health | circuit state, upstream latency, fallback count | which provider is degrading and how often we’re saved by fallback |
| Auth cache | L1 hit rate, L2 hit rate, resolve-key RPS | cache degradation = spike in control-plane load |
| Budgets & debits | outbox fill-pct, flush failures, 4xx drops | outbox pressure before silent data loss |
| Guardrails | verdicts by direction, blocks by reason | catches both legit policy enforcement and regressions |
| Streaming | active streams, usage-warning rate | streams that never reported usage = cost blind spot |
| Cache (passthrough) | hits / bypass / forced — Anthropic discount proxy | cache outcome vs expectation |
| Infrastructure | pod replicas, CPU, memory, goroutines | k8s-side correlates for everything above |
| Lifecycle | draining pods, in-flight requests | detect hung drains before SIGKILL |
Prerequisites
- Grafana ≥ 10.0 (uses time series panels +
$__rate_interval). - A Prometheus data source scraping the gateway’s
/metrics. See Helm → Monitoring. - The
kube-state-metricsexporter if you want the infrastructure row — it’s how we read pod CPU/memory/replica count.
Import
Save the JSON below aslangwatch-gateway.json, then in Grafana: Dashboards → New → Import → Upload JSON. Set the data source to your Prometheus instance on first import; it propagates to every panel.
Reading the dashboard during an incident
The panels are deliberately ordered so an on-call engineer scans top-down:- Request health — is this user-impacting? Latency + error rate together answer it.
- Provider health — if error rate is up, which provider? If circuit is open, we’re degraded but not down (fallback is handling it).
- Auth cache — if latency is up with upstreams happy, L1 miss spike points at control-plane pressure.
- Budgets & debits — outbox shelf > 0 for minutes means debits are failing persistently; correlate with control-plane errors.
- Guardrails — block spikes are either a legit policy enforcement or a regression.
direction+reasondisambiguate. - Streaming — active streams shows concurrency; fail-open rate is the stream_chunk guardrail escape hatch firing (by design).
- Cache (passthrough) — Anthropic
cache_controldiscount is only visible here; a drop to near-zero hits on a prefix-heavy workload is a regression. - Infrastructure — goroutines climbing monotonically is the leak signal. Replicas + CPU correlate HPA pressure to everything above.
Pairing with alerts and runbook
Use this dashboard as the landing page from each alert’srunbook_url. Example Alertmanager wiring:
Skipping panels you don’t have metrics for
If you haven’t deployed Redis, the L2 hit-rate line is flat at zero — harmless. If you haven’t enablednetworkPolicy, the provider egress row doesn’t need a custom panel since flow is unchanged. The JSON uses label_values(...) so missing labels degrade gracefully rather than breaking the dashboard.
See also
- Prometheus alerts — the rules that tell you when to open this dashboard.
- Production runbook — what to do when the dashboard pinpoints a row.
- Observability — deep-dive on per-tenant OTel routing + span attribute reference.