Prometheus alert rules

The gateway exposes a /metrics endpoint scrapable by any standard Prometheus setup. This cookbook is a drop-in set of alert rules that cover what you actually need to page on: provider outages, budget bypass, auth cache degradation, streaming bugs, and cost anomalies. Copy this into alerts/langwatch-gateway.yml and prometheus reload or kubectl rollout restart your Prometheus Operator-managed PrometheusRule CR.

The ruleset

groups:
  - name: langwatch-gateway
    interval: 30s
    rules:

      # ─── Reliability ─────────────────────────────────────────────────────

      - alert: GatewayHighErrorRate
        expr: |
          sum(rate(gateway_http_requests_total{status=~"5.."}[5m]))
            / sum(rate(gateway_http_requests_total[5m])) > 0.05
        for: 5m
        labels: { severity: page, team: ai-platform }
        annotations:
          summary: "Gateway 5xx rate > 5% over 5m"
          description: |
            End-user requests to the gateway are returning 5xx at {{ printf "%.1f" (mulScalar $value 100) }}%.
            Check /readyz on pods, then upstream provider status.

      - alert: GatewayReadinessFlapping
        expr: |
          changes(kube_pod_status_ready{
              condition="true",
              namespace="langwatch",
              pod=~"langwatch-gateway-.*"
            }[10m]) > 4
        for: 5m
        labels: { severity: warn, team: ai-platform }
        annotations:
          summary: "Gateway pod {{ $labels.pod }} readiness flapped > 4 times in 10m"
          description: "Pod is flipping ready/not-ready. Investigate /readyz output."

      # ─── Provider health ─────────────────────────────────────────────────

      - alert: CircuitOpenTooLong
        expr: gateway_circuit_state == 2
        for: 5m
        labels: { severity: page, team: ai-platform }
        annotations:
          summary: "Circuit open for credential {{ $labels.credential_id }} > 5m"
          description: |
            Gateway has tripped the breaker and is skipping this provider.
            Verify the provider is actually having issues (check X-LangWatch-
            Provider headers on recent failures + provider status page).

      - alert: ExcessiveFallback
        expr: |
          sum(rate(gateway_provider_attempts_total{outcome="fallback"}[10m]))
            / sum(rate(gateway_provider_attempts_total[10m])) > 0.1
        for: 10m
        labels: { severity: warn, team: ai-platform }
        annotations:
          summary: "> 10% of requests falling back over 10m"
          description: |
            Primary provider is degraded enough that > 10% of requests are
            walking the fallback chain. Not paging because fallback is
            working as designed, but the primary needs investigation.

      # ─── Budget governance ───────────────────────────────────────────────

      - alert: StreamingUsageMissing
        expr: |
          sum(rate(gateway_streaming_usage_missing_total[10m])) > 0
        for: 10m
        labels: { severity: page, team: ai-platform }
        annotations:
          summary: "Streaming requests without usage reported"
          description: |
            Streaming request completing without token counts means $0 is
            being debited to the budget — budgets are silently bypassed.
            OpenAI requires stream_options.include_usage=true on the client.
            See /ai-gateway/streaming#usage-extraction-critical-for-streaming-budgets.

      - alert: BudgetDebitOutboxBacklog
        expr: gateway_budget_debit_outbox_depth > 1000
        for: 5m
        labels: { severity: page, team: ai-platform }
        annotations:
          summary: "Budget debit outbox depth > 1000 for 5m"
          description: |
            Gateway can't reach the control plane /budget/debit endpoint fast
            enough. Customer budgets are NOT being debited in real-time;
            near-limit customers may be over-consuming. Check control-plane
            health + gateway_budget_debit_outbox_dropped_total rate.

      # --- Iter 21 outbox leading-indicator rules
      # Fill-pct (depth/capacity) leads the absolute-depth rule above and
      # catches pods with different capacities consistently.
      - alert: BudgetOutboxFillPctHigh
        expr: |
          max by (pod) (gateway_budget_debit_outbox_depth)
            / max by (pod) (gateway_budget_debit_outbox_capacity) > 0.5
        for: 5m
        labels: { severity: warn, team: ai-platform }
        annotations:
          summary: "Outbox fill-pct > 50% on {{ $labels.pod }} for 5m"
          description: |
            Approaching capacity. See production runbook Recipe 6 — normal
            bursts self-heal; sustained climb is usually flush failure.

      # Flush-failure rate leads depth climb — catches slow/unreachable
      # control plane BEFORE events back up enough to alert on depth.
      - alert: BudgetOutboxFlushFailures
        expr: rate(gateway_budget_debit_outbox_flush_failures_total[5m]) > 0
        for: 5m
        labels: { severity: warn, team: ai-platform }
        annotations:
          summary: "Outbox flush failures for 5m on {{ $labels.pod }}"
          description: |
            Control plane is slow/unreachable; debits are re-enqueued and
            safe, but latency to reconcile is growing. Investigate control-
            plane /budget/debit latency before this turns into a depth alert.

      # 4xx drops = silent data loss. Any non-zero rate is immediately
      # actionable (signing drift / payload drift / control-plane bug).
      - alert: BudgetOutbox4xxDrops
        expr: increase(gateway_budget_debit_outbox_4xx_drops_total[15m]) > 0
        labels: { severity: page, team: ai-platform }
        annotations:
          summary: "Outbox debits are being terminally dropped with 4xx"
          description: |
            Silent data loss. Debits are being permanently rejected by the
            control plane. Most common: LW_GATEWAY_INTERNAL_SECRET drift
            after rotation. See production runbook Recipe 6.

      - alert: BudgetCheckLiveFailRate
        expr: |
          sum(rate(gateway_budget_check_live_total{outcome="transport_error"}[5m]))
            / sum(rate(gateway_budget_check_live_total[5m])) > 0.2
        for: 5m
        labels: { severity: warn, team: ai-platform }
        annotations:
          summary: "Live /budget/check failing open > 20%"
          description: |
            Live reconciliation for near-limit scopes is timing out or erroring
            and falling back to the cached snapshot. Near-limit customers may
            briefly over-consume. Check control-plane /budget/check latency.

      # ─── Auth cache ──────────────────────────────────────────────────────

      - alert: AuthCacheHitRateDropped
        expr: |
          sum(rate(gateway_auth_cache_hits_total{tier="l1"}[5m]))
            / ( sum(rate(gateway_auth_cache_hits_total{tier="l1"}[5m]))
              + sum(rate(gateway_auth_cache_misses_total{tier="l1"}[5m])) ) < 0.9
        for: 15m
        labels: { severity: warn, team: ai-platform }
        annotations:
          summary: "L1 auth cache hit rate < 90% for 15m"
          description: |
            Sustained cache-miss rate means every miss costs a /resolve-key
            round-trip to the control plane. Investigate: recent deploy churn,
            /changes feed reliability, or LRU eviction under load.

      # ─── Cost & anomaly ──────────────────────────────────────────────────

      # Note: per-request cost anomaly detection requires a cost metric
      # (e.g. gateway_cost_usd_total) that isn't in the current collector set.
      # It's easier to derive cost anomalies from the GatewayBudgetLedger
      # table in the control-plane warehouse rather than Prometheus, since
      # cost attribution happens on the control-plane side after debit.
      # See /gateway/usage UI for the visual equivalent.

      # ─── Blocked-by-policy noise ────────────────────────────────────────

      - alert: GuardrailBlockSpike
        expr: |
          sum by (direction) (
            rate(gateway_guardrail_verdicts_total{verdict="block"}[5m])
          ) > 10
        for: 5m
        labels: { severity: warn, team: ai-platform }
        annotations:
          summary: "> 10 req/s blocked by policy reason={{ $labels.reason }}"
          description: |
            Legit block storm (a customer's runtime is trying a banned tool
            repeatedly) or a regression in the policy_rules config.
            Check LangWatch traces filtered on
            attr.langwatch.policy.blocked != "".

Why these and not others

The temptation with Prometheus rules is to alert on everything. Don’t. Each rule above either:

Represents real impact (5xx rate, circuit open, usage missing) — customers or budgets are actively affected.
Is a leading indicator of incoming impact (excessive fallback, L1 cache drop, outbox stale) — things get worse soon if ignored.
Is a cost guardrail (streaming usage missing, cost anomaly) — silent over-spend is a real failure mode for AI infra.

Rules we explicitly do NOT have:

Individual provider 429s — these are normal operating state; fallback + circuit handle them without a human.
High latency — the gateway’s added latency is bounded; if a provider is slow, alerting on it is alerting on the provider.
Cache miss rate on upstream — cache is passthrough; if Anthropic doesn’t hit cache that’s not the gateway’s problem.

Slack / PagerDuty routing

Route severity: page to PagerDuty, severity: warn to Slack. Example Alertmanager config:

route:
  receiver: slack-gateway
  group_by: [alertname, team]
  routes:
    - matchers: [severity="page", team="ai-platform"]
      receiver: pagerduty-ai-platform
      continue: true
    - matchers: [severity="warn", team="ai-platform"]
      receiver: slack-gateway

receivers:
  - name: pagerduty-ai-platform
    pagerduty_configs:
      - routing_key: "<pager-duty-service-key>"
  - name: slack-gateway
    slack_configs:
      - channel: "#ai-gateway-alerts"
        title: "{{ .GroupLabels.alertname }}"
        text: "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}"

Verifying rules before deploy

promtool check rules alerts/langwatch-gateway.yml
promtool test rules alerts/langwatch-gateway.test.yml

Example unit test:

rule_files:
  - alerts/langwatch-gateway.yml

evaluation_interval: 1m

tests:
  - interval: 1m
    input_series:
      - series: 'gateway_http_requests_total{status="500"}'
        values: '0 1 2 3 4 5 6 7 8 9 10'
      - series: 'gateway_http_requests_total{status="200"}'
        values: '0 0 0 0 0 0 0 0 0 0 0'
    alert_rule_test:
      - eval_time: 10m
        alertname: GatewayHighErrorRate
        exp_alerts:
          - exp_labels: { severity: page, team: ai-platform }

Grafana dashboard

A matching Grafana dashboard JSON is at /ai-gateway/self-hosting/helm#monitoring. Panels mirror the alert rules so you can visually see what’s about to fire.

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Prometheus alert rules

The ruleset

Why these and not others

Slack / PagerDuty routing

Verifying rules before deploy

Grafana dashboard

See also

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Documentation Index

​The ruleset

​Why these and not others

​Slack / PagerDuty routing

​Verifying rules before deploy

​Grafana dashboard

​See also

The ruleset

Why these and not others

Slack / PagerDuty routing

Verifying rules before deploy

Grafana dashboard

See also