Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

The gateway exposes three HTTP endpoints for Kubernetes probes, all on the same port the API listens on (5563 by default — referenced as the named container port http in the chart). Each is deliberately scoped — /readyz flipping to 503 must mean “this replica should not serve customer traffic right now”, and nothing more.

Endpoint summary

EndpointUsed byReturns 200 when
/healthzKubernetes livenessProbeThe Go process is responsive (no registered checks fail)
/readyzKubernetes readinessProbe + LBThe pod is not draining and all readiness checks pass
/startupzKubernetes startupProbeBootstrap (auth-cache warm-up) has completed, then mirrors /readyz
/metricsPrometheusAlways — counter + histogram surface (see Observability)
All three return the same JSON shape:
{
  "status":   "ok | degraded | starting | draining",
  "version":  "git-<short-sha>",
  "uptime_s": 12.482,
  "checks":   { "<name>": "ok | <failure detail>" }
}
checks is omitted when there are no registered checks AND the status is ok. Today the gateway registers MarkStarted (after the auth-cache bootstrap completes) and MarkDraining (on SIGTERM), but no per-dependency liveness or readiness checks — the probes are intentionally lightweight signals about process state and lifecycle, not external-dependency health. Dependency health is reported via per-request error codes and the OTel trace surface, not the probes.

/healthz (liveness)

GET /healthz
→ 200 OK
{ "status": "ok", "version": "git-a1412a4", "uptime_s": 84.12 }
Cheap by design. Never does network I/O. If this returns non-200 the kubelet kills the pod. Chart default (charts/gateway/templates/deployment.yaml):
livenessProbe:
  httpGet: { path: /healthz, port: http }   # http = 5563
  periodSeconds: 10
  timeoutSeconds: 2
  failureThreshold: 3

/readyz (readiness)

GET /readyz
→ 200 OK
{ "status": "ok", "version": "git-a1412a4", "uptime_s": 84.12 }
When the pod has received SIGTERM:
→ 503 Service Unavailable
{ "status": "draining", "version": "git-a1412a4", "uptime_s": 1812.74 }
The loadbalancer drops a draining replica from rotation within seconds; in-flight requests on that replica continue to completion (graceful shutdown is described below). Do not weaken this probe — a replica flipping to draining must mean “stop sending new traffic here”, and the gateway only flips it on SIGTERM or via an explicit administrative call. Chart default:
readinessProbe:
  httpGet: { path: /readyz, port: http }
  periodSeconds: 5
  timeoutSeconds: 1
  failureThreshold: 2
  successThreshold: 1

What /readyz does NOT check

  • Per-provider live health. If OpenAI is rate-limiting, the gateway still serves and falls back per the VK’s configured chain. Reporting not_ready for one upstream would amplify the incident.
  • Redis L2. The current gateway has no Redis client; the auth cache is in-process LRU.
  • PostgreSQL. The gateway never talks to Postgres directly; the control plane mediates persistence.
The blast-radius of readiness is specifically: “is this pod still meant to serve traffic?”, and the only conditions today are MarkDraining and (transitively, via /startupz mirroring /readyz once started) the auth-cache bootstrap.

/startupz (startup)

GET /startupz
→ 503 Service Unavailable          (during boot, before bootstrap completes)
{ "status": "starting", "version": "git-a1412a4", "uptime_s": 0.93 }

GET /startupz
→ 200 OK                           (after bootstrap completes; mirrors /readyz)
{ "status": "ok", "version": "git-a1412a4", "uptime_s": 12.48 }
The bootstrap step is the auth-cache warm-up: the gateway calls the control plane’s /api/internal/gateway/keys/changes once on boot to fill the L1 cache before taking traffic. Tenants with 50K+ virtual keys take 2–3 seconds; the failure threshold gives ~60 s of room before the kubelet declares the pod failed and restarts it (chart default failureThreshold: 30 × periodSeconds: 2 = 60 s of bootstrap time).
startupProbe:
  httpGet: { path: /startupz, port: http }
  initialDelaySeconds: 5
  periodSeconds: 2
  failureThreshold: 30
Once /startupz first returns 200, Kubernetes stops calling it; readiness + liveness take over.

Graceful shutdown

SIGTERM triggers (the chart’s shutdown.preDrainWait and shutdown.timeout knobs are not currently wired into the gateway code; the gateway uses Server.GracefulSeconds from pkg/config/server.go, default 5 — bump via SERVER_GRACEFUL_SECONDS if you need a longer window):
  1. Immediately: MarkDraining() flips /readyz to 503 with status:"draining". The Service’s endpoint controller and the LB observe and stop routing new traffic.
  2. Drain window: existing in-flight requests continue. SSE streams continue until the upstream provider closes them or the request finishes naturally.
  3. End of GracefulSeconds: the HTTP server shuts down; remaining sockets close cleanly.
  4. Process exits 0.
Match your terminationGracePeriodSeconds to the drain window plus a few seconds of slack:
terminationGracePeriodSeconds: 30  # chart default
Without giving the LB time to notice /readyz=503 before the listener closes, a small fraction of in-flight requests during rolling deploys hits a replica that has already shut down — the LB returns 502. If you observe this on rollout, add a preStop sleep so the pod stays around long enough for the LB to remove it:
lifecycle:
  preStop:
    exec: { command: ["sleep", "3"] }

End-to-end synthetic check

/readyz validates lifecycle state but not the request path. For real-traffic confidence, run a synthetic completion every 30 s:
curl --fail --max-time 5 \
  -H "Authorization: Bearer $LW_SYNTHETIC_VK" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5-mini","messages":[{"role":"user","content":"ping"}],"max_tokens":1}' \
  https://gateway.your-corp.com/v1/chat/completions
On alert:
  1. Capture the X-LangWatch-Request-Id header from the response.
  2. Look up the request’s trace in the LangWatch UI under Origin = gateway.
  3. If the gateway returned 5xx, pull the gateway pod logs filtered by pod name (GATEWAY_NODE_ID is unused; the gateway derives node_id from os.Hostname() which inside a Kubernetes pod is the pod name).
  4. If the gateway returned the upstream provider’s error verbatim, the gateway is healthy — the provider is degraded.

Common failures

SymptomLikely causeFirst step
/healthz 200, /v1/chat/completions returns 500 auth_upstream_unavailableGateway can’t reach control plane (LW_GATEWAY_BASE_URL wrong/missing)Check the env var on the pod; confirm Service / DNS resolves
/startupz fails 30× in a row, pod CrashLoopBackOffControl plane unreachable on boot, or auth secret mismatchCompare LW_GATEWAY_INTERNAL_SECRET byte-for-byte between gateway and control-plane pods
/readyz flips to 503 mid-lifePod received SIGTERM (rolling deploy, eviction, OOMKill)kubectl describe pod for the termination reason
502 from LB for a small window during rolling deployLB hasn’t yet removed the draining replicaAdd a preStop: sleep 3 lifecycle hook (see Graceful shutdown above)