The gateway exposes three HTTP endpoints for Kubernetes probes, all on the same port the API listens on (Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
5563 by default — referenced as the named container port http in the chart). Each is deliberately scoped — /readyz flipping to 503 must mean “this replica should not serve customer traffic right now”, and nothing more.
Endpoint summary
| Endpoint | Used by | Returns 200 when |
|---|---|---|
/healthz | Kubernetes livenessProbe | The Go process is responsive (no registered checks fail) |
/readyz | Kubernetes readinessProbe + LB | The pod is not draining and all readiness checks pass |
/startupz | Kubernetes startupProbe | Bootstrap (auth-cache warm-up) has completed, then mirrors /readyz |
/metrics | Prometheus | Always — counter + histogram surface (see Observability) |
checks is omitted when there are no registered checks AND the status is ok. Today the gateway registers MarkStarted (after the auth-cache bootstrap completes) and MarkDraining (on SIGTERM), but no per-dependency liveness or readiness checks — the probes are intentionally lightweight signals about process state and lifecycle, not external-dependency health. Dependency health is reported via per-request error codes and the OTel trace surface, not the probes.
/healthz (liveness)
charts/gateway/templates/deployment.yaml):
/readyz (readiness)
SIGTERM:
SIGTERM or via an explicit administrative call.
Chart default:
What /readyz does NOT check
- Per-provider live health. If OpenAI is rate-limiting, the gateway still serves and falls back per the VK’s configured chain. Reporting
not_readyfor one upstream would amplify the incident. - Redis L2. The current gateway has no Redis client; the auth cache is in-process LRU.
- PostgreSQL. The gateway never talks to Postgres directly; the control plane mediates persistence.
MarkDraining and (transitively, via /startupz mirroring /readyz once started) the auth-cache bootstrap.
/startupz (startup)
/api/internal/gateway/keys/changes once on boot to fill the L1 cache before taking traffic. Tenants with 50K+ virtual keys take 2–3 seconds; the failure threshold gives ~60 s of room before the kubelet declares the pod failed and restarts it (chart default failureThreshold: 30 × periodSeconds: 2 = 60 s of bootstrap time).
/startupz first returns 200, Kubernetes stops calling it; readiness + liveness take over.
Graceful shutdown
SIGTERM triggers (the chart’s shutdown.preDrainWait and shutdown.timeout knobs are not currently wired into the gateway code; the gateway uses Server.GracefulSeconds from pkg/config/server.go, default 5 — bump via SERVER_GRACEFUL_SECONDS if you need a longer window):
- Immediately:
MarkDraining()flips/readyzto 503 withstatus:"draining". The Service’s endpoint controller and the LB observe and stop routing new traffic. - Drain window: existing in-flight requests continue. SSE streams continue until the upstream provider closes them or the request finishes naturally.
- End of
GracefulSeconds: the HTTP server shuts down; remaining sockets close cleanly. - Process exits 0.
terminationGracePeriodSeconds to the drain window plus a few seconds of slack:
/readyz=503 before the listener closes, a small fraction of in-flight requests during rolling deploys hits a replica that has already shut down — the LB returns 502. If you observe this on rollout, add a preStop sleep so the pod stays around long enough for the LB to remove it:
End-to-end synthetic check
/readyz validates lifecycle state but not the request path. For real-traffic confidence, run a synthetic completion every 30 s:
- Capture the
X-LangWatch-Request-Idheader from the response. - Look up the request’s trace in the LangWatch UI under Origin =
gateway. - If the gateway returned 5xx, pull the gateway pod logs filtered by pod name (
GATEWAY_NODE_IDis unused; the gateway derivesnode_idfromos.Hostname()which inside a Kubernetes pod is the pod name). - If the gateway returned the upstream provider’s error verbatim, the gateway is healthy — the provider is degraded.
Common failures
| Symptom | Likely cause | First step |
|---|---|---|
/healthz 200, /v1/chat/completions returns 500 auth_upstream_unavailable | Gateway can’t reach control plane (LW_GATEWAY_BASE_URL wrong/missing) | Check the env var on the pod; confirm Service / DNS resolves |
/startupz fails 30× in a row, pod CrashLoopBackOff | Control plane unreachable on boot, or auth secret mismatch | Compare LW_GATEWAY_INTERNAL_SECRET byte-for-byte between gateway and control-plane pods |
/readyz flips to 503 mid-life | Pod received SIGTERM (rolling deploy, eviction, OOMKill) | kubectl describe pod for the termination reason |
| 502 from LB for a small window during rolling deploy | LB hasn’t yet removed the draining replica | Add a preStop: sleep 3 lifecycle hook (see Graceful shutdown above) |