Health Checks

The gateway exposes three HTTP endpoints for Kubernetes probes, all on the same port the API listens on (5563 by default — referenced as the named container port http in the chart). Each is deliberately scoped — /readyz flipping to 503 must mean “this replica should not serve customer traffic right now”, and nothing more.

Endpoint summary

Endpoint	Used by	Returns 200 when
`/healthz`	Kubernetes `livenessProbe`	The Go process is responsive (no registered checks fail)
`/readyz`	Kubernetes `readinessProbe` + LB	The pod is not draining and all readiness checks pass
`/startupz`	Kubernetes `startupProbe`	Bootstrap (auth-cache warm-up) has completed, then mirrors `/readyz`
`/metrics`	Prometheus	Always — counter + histogram surface (see Observability)

All three return the same JSON shape:

{
  "status":   "ok | degraded | starting | draining",
  "version":  "git-<short-sha>",
  "uptime_s": 12.482,
  "checks":   { "<name>": "ok | <failure detail>" }
}

checks is omitted when there are no registered checks AND the status is ok. Today the gateway registers MarkStarted (after the auth-cache bootstrap completes) and MarkDraining (on SIGTERM), but no per-dependency liveness or readiness checks — the probes are intentionally lightweight signals about process state and lifecycle, not external-dependency health. Dependency health is reported via per-request error codes and the OTel trace surface, not the probes.

`/healthz` (liveness)

GET /healthz
→ 200 OK
{ "status": "ok", "version": "git-a1412a4", "uptime_s": 84.12 }

Cheap by design. Never does network I/O. If this returns non-200 the kubelet kills the pod. Chart default (charts/gateway/templates/deployment.yaml):

livenessProbe:
  httpGet: { path: /healthz, port: http }   # http = 5563
  periodSeconds: 10
  timeoutSeconds: 2
  failureThreshold: 3

`/readyz` (readiness)

GET /readyz
→ 200 OK
{ "status": "ok", "version": "git-a1412a4", "uptime_s": 84.12 }

When the pod has received SIGTERM:

→ 503 Service Unavailable
{ "status": "draining", "version": "git-a1412a4", "uptime_s": 1812.74 }

The loadbalancer drops a draining replica from rotation within seconds; in-flight requests on that replica continue to completion (graceful shutdown is described below). Do not weaken this probe — a replica flipping to draining must mean “stop sending new traffic here”, and the gateway only flips it on SIGTERM or via an explicit administrative call. Chart default:

readinessProbe:
  httpGet: { path: /readyz, port: http }
  periodSeconds: 5
  timeoutSeconds: 1
  failureThreshold: 2
  successThreshold: 1

What `/readyz` does NOT check

Per-provider live health. If OpenAI is rate-limiting, the gateway still serves and falls back per the VK’s configured chain. Reporting not_ready for one upstream would amplify the incident.
Redis L2. The current gateway has no Redis client; the auth cache is in-process LRU.
PostgreSQL. The gateway never talks to Postgres directly; the control plane mediates persistence.

The blast-radius of readiness is specifically: “is this pod still meant to serve traffic?”, and the only conditions today are MarkDraining and (transitively, via /startupz mirroring /readyz once started) the auth-cache bootstrap.

`/startupz` (startup)

GET /startupz
→ 503 Service Unavailable          (during boot, before bootstrap completes)
{ "status": "starting", "version": "git-a1412a4", "uptime_s": 0.93 }

GET /startupz
→ 200 OK                           (after bootstrap completes; mirrors /readyz)
{ "status": "ok", "version": "git-a1412a4", "uptime_s": 12.48 }

The bootstrap step is the auth-cache warm-up: the gateway calls the control plane’s /api/internal/gateway/keys/changes once on boot to fill the L1 cache before taking traffic. Tenants with 50K+ virtual keys take 2–3 seconds; the failure threshold gives ~60 s of room before the kubelet declares the pod failed and restarts it (chart default failureThreshold: 30 × periodSeconds: 2 = 60 s of bootstrap time).

startupProbe:
  httpGet: { path: /startupz, port: http }
  initialDelaySeconds: 5
  periodSeconds: 2
  failureThreshold: 30

Once /startupz first returns 200, Kubernetes stops calling it; readiness + liveness take over.

Graceful shutdown

SIGTERM triggers (the chart’s shutdown.preDrainWait and shutdown.timeout knobs are not currently wired into the gateway code; the gateway uses Server.GracefulSeconds from pkg/config/server.go, default 5 — bump via SERVER_GRACEFUL_SECONDS if you need a longer window):

Immediately: MarkDraining() flips /readyz to 503 with status:"draining". The Service’s endpoint controller and the LB observe and stop routing new traffic.
Drain window: existing in-flight requests continue. SSE streams continue until the upstream provider closes them or the request finishes naturally.
End of GracefulSeconds: the HTTP server shuts down; remaining sockets close cleanly.
Process exits 0.

Match your terminationGracePeriodSeconds to the drain window plus a few seconds of slack:

terminationGracePeriodSeconds: 30  # chart default

Without giving the LB time to notice /readyz=503 before the listener closes, a small fraction of in-flight requests during rolling deploys hits a replica that has already shut down — the LB returns 502. If you observe this on rollout, add a preStop sleep so the pod stays around long enough for the LB to remove it:

lifecycle:
  preStop:
    exec: { command: ["sleep", "3"] }

End-to-end synthetic check

/readyz validates lifecycle state but not the request path. For real-traffic confidence, run a synthetic completion every 30 s:

curl --fail --max-time 5 \
  -H "Authorization: Bearer $LW_SYNTHETIC_VK" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5-mini","messages":[{"role":"user","content":"ping"}],"max_tokens":1}' \
  https://gateway.your-corp.com/v1/chat/completions

On alert:

Capture the X-LangWatch-Request-Id header from the response.
Look up the request’s trace in the LangWatch UI under Origin = gateway.
If the gateway returned 5xx, pull the gateway pod logs filtered by pod name (GATEWAY_NODE_ID is unused; the gateway derives node_id from os.Hostname() which inside a Kubernetes pod is the pod name).
If the gateway returned the upstream provider’s error verbatim, the gateway is healthy — the provider is degraded.

Common failures

Symptom	Likely cause	First step
`/healthz` 200, `/v1/chat/completions` returns 500 `auth_upstream_unavailable`	Gateway can’t reach control plane (`LW_GATEWAY_BASE_URL` wrong/missing)	Check the env var on the pod; confirm Service / DNS resolves
`/startupz` fails 30× in a row, pod CrashLoopBackOff	Control plane unreachable on boot, or auth secret mismatch	Compare `LW_GATEWAY_INTERNAL_SECRET` byte-for-byte between gateway and control-plane pods
`/readyz` flips to 503 mid-life	Pod received `SIGTERM` (rolling deploy, eviction, OOMKill)	`kubectl describe pod` for the termination reason
502 from LB for a small window during rolling deploy	LB hasn’t yet removed the draining replica	Add a `preStop: sleep 3` lifecycle hook (see Graceful shutdown above)

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Endpoint summary

`/healthz` (liveness)

`/readyz` (readiness)

What `/readyz` does NOT check

`/startupz` (startup)

Graceful shutdown

End-to-end synthetic check

Common failures

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Documentation Index

​Endpoint summary

​/healthz (liveness)

​/readyz (readiness)

​What /readyz does NOT check

​/startupz (startup)

​Graceful shutdown

​End-to-end synthetic check

​Common failures

Endpoint summary

`/healthz` (liveness)

`/readyz` (readiness)

What `/readyz` does NOT check

`/startupz` (startup)

Graceful shutdown

End-to-end synthetic check

Common failures