The AI Gateway ships as a sub-chart (Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
charts/gateway/) of the LangWatch umbrella chart (charts/langwatch/). Installing the umbrella with gateway.chartManaged: true (the default) lifts the gateway pod alongside langwatch-app in the same release. The gateway runs as a separate Kubernetes Deployment (own pod, own container, own service) and reaches the LangWatch control plane for VK resolution, budget enforcement, and guardrail execution.
The gateway sub-chart’s
version and appVersion are bumped in lockstep with the umbrella langwatch chart by release-please (.github/release-please-config.json . package extra-files). helm install langwatch/langwatch@3.x always pulls a matching gateway sub-chart — operators should never need to pin them separately.Pre-create the runtime secrets
The gateway pod will not start until two values exist in a Kubernetes Secret that the chart references but does not materialise. This is the most common rollout failure, so do it beforehelm install:
langwatch-app Deployment under the same env-var names — the gateway and control plane sign and verify each other’s calls byte-for-byte. The umbrella chart references one Secret name from both Deployments to keep them in sync; if you split secrets across two Kubernetes Secrets, ensure the values match.
The third sensitive value, LW_VIRTUAL_KEY_PEPPER, is control-plane-only: it hashes virtual-key secrets at rest and must never leave the langwatch-app pod. The gateway never reads it.
Failure mode if you skip this step: the gateway pod’s valueFrom.secretKeyRef resolves against a missing Secret and the pod loops in CreateContainerConfigError until the helm install --atomic deadline (default 8 min) trips a rollback.
Image registry and tag
The gateway image is published to Docker Hub (docker.io/langwatch/ai-gateway) by the ai-gateway matrix entry in .github/workflows/publish-docker-app.yml on every release.published event for langwatch@* tags. Each release publishes three tags: <version>, latest, and <short-sha>.
In the chart, leaving image.tag: "" (the default) resolves to .Chart.AppVersion, which is bumped in lockstep with the langwatch app version. Override only when you mirror the image to an internal registry or pin to a specific sha for a hotfix:
Enabling the gateway
In your umbrellavalues.yaml:
charts/gateway/values.yaml (canonical for everything not exposed at the umbrella surface). All defaults are production-safe; you should only need to override controlPlane.baseUrl, secrets.existingSecretName, ingress.host, and the OTLP endpoint.
Apply:
Env vars wired by the chart
Source of truth:charts/gateway/templates/configmap.yaml (commit d14576b1e) and charts/gateway/templates/deployment.yaml. The chart wires only the env vars the Go gateway actually reads — knobs in values.yaml that don’t appear here are forward-compat surface for v1.1 (e.g. cache.bootstrapAllKeys, admin.*, redis.url) and are silently ignored by the running binary. The canonical names are derived by services/aigateway/config.go pkg/config.Hydrate, which walks the Config struct’s env:"…" tags and joins parent/child with _.
| Env var | Source | Purpose |
|---|---|---|
SERVER_ADDR | configmap (":5563") | Public listen addr. Service targetPort matches |
LOG_LEVEL | configmap (logging.level) | debug / info / warn / error |
LW_GATEWAY_BASE_URL | configmap (controlPlane.baseUrl) | Where the gateway reaches /api/internal/gateway/* on the control plane |
OTEL_OTLP_ENDPOINT | configmap (otel.endpoint, only when set) | OTLP/HTTP ingest URL |
SERVER_MAX_REQUEST_BODY_BYTES | configmap (security.maxRequestBodyBytes) | Body-size cap, rendered as %d int |
LW_GATEWAY_INTERNAL_SECRET | Secret secrets.existingSecretName | HMAC for /api/internal/gateway/* calls |
LW_GATEWAY_JWT_SECRET | Secret secrets.existingSecretName | HS256 for resolve-key JWT |
LW_GATEWAY_JWT_SECRET_PREVIOUS | Secret (only when secrets.jwtSecretPreviousKey is set) | Retired key during rotation overlap |
GATEWAY_REDIS_URL, GATEWAY_ADMIN_AUTH_TOKEN, or GATEWAY_OTEL_DEFAULT_AUTH_TOKEN — the v1 gateway has no Redis client, admin/pprof listener, or OTLP auth-token field on its config struct, and silently dropping them at the env-var layer would mislead operators. OTLP authentication for self-hosters is delivered via OTEL_OTLP_HEADERS (e.g. Authorization=Bearer …) on the OTel config struct in pkg/config/otel.go. See charts/gateway/templates/deployment.yaml for the rationale comment.
Secret rotation
RotateLW_GATEWAY_JWT_SECRET without downtime using the dual-key overlap pattern. The chart conditionally injects a second secretKeyRef from the same Secret when jwtSecretPreviousKey is set:
jwtSecretPreviousKey is non-empty, the deployment renders a second env var (LW_GATEWAY_JWT_SECRET_PREVIOUS) and the gateway’s resolver verifies tokens against either value. When empty, no second env var is set and the gateway runs strict.
4-step zero-downtime flow:
- Flip the control plane’s signing secret to the new value (rotate
LW_GATEWAY_JWT_SECRETon thelangwatch-appDeployment first — the issuer must accept the new value before the gateway starts seeing it on inbound JWTs). - Update the
langwatch-gateway-authSecret: keepLW_GATEWAY_JWT_SECRETpointing at the new value, addLW_GATEWAY_JWT_SECRET_PREVIOUScarrying the old value.helm upgradewithgateway.secrets.jwtSecretPreviousKey: LW_GATEWAY_JWT_SECRET_PREVIOUS. - Rolling restart picks up both keys. Tokens signed under the old key verify against
LW_GATEWAY_JWT_SECRET_PREVIOUS; new tokens useLW_GATEWAY_JWT_SECRET. - After ~15 min (longest pre-rotation bundle TTL), remove
LW_GATEWAY_JWT_SECRET_PREVIOUSfrom the Secret and unsetjwtSecretPreviousKey. Rolling restart. Strict mode resumes.
jwt_secret_rotation_active WARN log on boot whenever the previous-key env var is set — operators should monitor for runaway rotation windows (accepting a retired key indefinitely is a security-posture regression):
LW_GATEWAY_INTERNAL_SECRET rotation uses the same Secret slot under a previous key — the HMAC verifier accepts both during the overlap window. Rotate both together in one operation when possible.
See Config → Secret rotation for the env-var contract.
Dependencies on the control plane
Two control-plane components must be running for the gateway to function fully:langwatch-appongateway.controlPlane.baseUrl(defaulthttp://langwatch-app:5560). The gateway hits/api/internal/gateway/resolve-key,/changes, and/budget/checksynchronously. If unreachable, the gateway extends its cached entries via stale-while-error (seeLW_GATEWAY_AUTH_CACHE_SOFT_BUMP/HARD_GRACE) but new VKs cannot resolve and rate limits cannot tighten.langwatch-workersrunning the trace-processing pipeline. Budget enforcement depends on this. The gateway emits OTLP spans with cost attrs but does NOT debit budgets directly — the workers’ trace-fold reactor (gatewayBudgetSync.reactor.ts) reads finalised spans, computes cost, and inserts ledger rows into ClickHouse. Thegateway_budget_scope_totals_mvmaterialized view aggregates spend, and/budget/checkreads from that view. Ifworkers.enabled: false, gateway traces still flow but spend never accumulates against budgets, so hard-cap scopes will not block at the configured limit. Self-hosters running the umbrella chart get workers by default; standalonecharts/gatewayinstalls MUST point at a control plane that has the worker pipeline running.
Networking
The gateway needs to reach:- The LangWatch control plane (
gateway.controlPlane.baseUrl→LW_GATEWAY_BASE_URL) for/api/internal/gateway/*calls. - Outbound to each provider API (OpenAI, Anthropic, Azure OpenAI, Bedrock, Vertex, Gemini). NetworkPolicy must allow egress (see below).
gateway.your-corp.com) on port 443. The chart hardcodes the path to /v1 so only the OpenAI-compatible / Anthropic / Gemini routes are publicly reachable; /healthz, /readyz, /metrics, and /debug/pprof/* are NEVER exposed via the public Ingress. See DNS and TLS for the full DNS/cert flow.
Scaling
See Scaling for HPA configuration, replica sizing, and Redis L2 cache guidance. Quick summary: the gateway is stateless and horizontally scalable; the chart’s HPA defaults to 2-10 replicas at 70% CPU plus a customlw_gateway_rps metric (requires Prometheus-adapter). In-flight streaming requests pin to a single replica, so HPA should scale on smoothed signals rather than chasing spikes.
High availability
- Run ≥2 replicas with a PodDisruptionBudget (chart default
minAvailable: 1) so rolling updates don’t remove every pod simultaneously. - The auth-cache stale-while-error grace window (
LW_GATEWAY_AUTH_CACHE_SOFT_BUMP/HARD_GRACE) keeps replicas serving with cached VK bundles when the control plane is briefly offline. See Auth-cache stale-while-error for the soft/hard-grace knobs. - The Redis L2 cache (
gateway.redis.url) is forward-compat surface today — the v1 gateway has no Redis client, so this knob is a no-op. v1.1 will wire it for replica-warm-up on HPA scale-out. Don’t rely on it for cold-start coverage in v1; the auth-cache grace window is the only operational backstop.
Monitoring
The gateway binary exposes Prometheus metrics at/metrics (port 5563). For the metric names + labels, see Observability.
The chart does not currently render a
ServiceMonitor object. Scrape via your own ServiceMonitor pointed at the gateway Service (port 80, target 5563), or via Pod-level annotations if your Prometheus uses the legacy annotation-based discovery. Chart-managed ServiceMonitor is on the v1.1 roadmap.Probes
The chart wires three Kubernetes probes against the gateway’s/healthz (liveness), /readyz (readiness), and /startupz (startup) endpoints, all on container port 5563. See Health checks for the canonical reference: probe semantics, cadence tuning, what each endpoint dials, and how to interpret a flapping /readyz during a rollout.
Startup network check
WhenNetworkPolicy is enabled, a subtly-broken egress rule — typically a missing DNS entry or a provider CIDR typo — passes /livez and /healthz (which never dial outbound) but causes the first customer request to fail. The startup netcheck closes that gap by running a one-shot DNS-resolve + TCP-dial against a configured host list before MarkStarted fires, failing the pod at deploy time rather than first request.
dns resolution failed: ..., the broken rule is DNS egress — almost always the kube-system :53 rule in your NetworkPolicy. If it fails with tcp dial failed: ..., DNS worked but the provider :443 egress rule is broken (wrong CIDR, wrong port, or a missing egressToProviders override). Operators can diagnose deploy failures without running tcpdump in the pod.
helm upgrade on an existing deploy without overriding startup.netcheckHosts renders the same ConfigMap that earlier chart versions did.
Admin listener
v1 status: the chart’s
gateway.admin.* value keys are forward-compat surface — the v1 gateway binary does not expose an admin/pprof listener and the deployment template does not inject GATEWAY_ADMIN_ADDR / GATEWAY_ADMIN_AUTH_TOKEN (see the rationale comment in charts/gateway/templates/deployment.yaml). The three postures below describe the planned v1.1 contract; setting the values today is a no-op. For live diagnostics in v1, fall back to kubectl exec + the production-runbook recipes.net/http/pprof diagnostic surface (v1.1) will ship behind a dedicated listener. Three deployment postures, pick the one that matches your cluster:
Posture 1: disabled
Posture 2: loopback-only (default)
127.0.0.1:6060, reachable only via kubectl port-forward. This is the default posture and appropriate for k8s deployments — kubectl port-forward tunnels through the Kubernetes API server, so every access is authenticated against your cluster’s RBAC and logged in the cluster audit trail.
Supplying existingAuthSecretName in this posture is allowed and recommended as a second layer of defence — the listener still binds loopback but also requires the bearer token.
Posture 3: bearer-gated non-loopback
Authorization: Bearer <token> on every request. Token comparison uses crypto/subtle.ConstantTimeCompare — no timing side-channel on length or prefix. The chart NEVER materialises the token into its own values or ConfigMap; only the reference to the pre-existing Secret flows through.
Pick this for:
- non-k8s deploys (systemd, Nomad, plain Docker) where
kubectl port-forwardisn’t available; - k8s deploys behind a corporate VPN or internal LB where operators want direct HTTP access without going through
kubectl exec.
Safety net (v1.1)
When the admin listener ships in v1.1, the gateway’sconfig.validate() will run at startup and enforce:
Non-loopbackThis is the critical safety net for the v1.1 admin surface: an operator who accidentally binds the admin listener toGATEWAY_ADMIN_ADDR+ emptyGATEWAY_ADMIN_AUTH_TOKEN+ unsetGATEWAY_ALLOW_INSECURE⇒ the gateway will refuse to start with an error naming both env vars.
0.0.0.0 without a token won’t silently expose unauthenticated pprof to the pod network. The pod will crash, k8s will back off, and the boot logs will name the env variable to fix. GATEWAY_ALLOW_INSECURE=true will exist only for test harnesses — never set it in production. None of this is wired in the v1 binary today; the chart values just describe the planned shape.
Accessing pprof with the bearer token
RequireBearer 401 emits a clean WWW-Authenticate: Bearer realm=gateway-admin challenge. Credentials monitoring (GitHub secret-scanning, Cloudflare Secret Scanner) will pick up accidental commits of the token — rotate via the standard Secret rotation flow if that happens.
Observability
The startup log line is enriched withauth_required and loopback_only booleans so operators have an audit breadcrumb:
NetworkPolicy
The gateway chart ships an optional KubernetesNetworkPolicy that implements deny-by-default ingress + egress on the gateway pod. It’s off by default so dev clusters (which often run without a CNI that supports NetworkPolicy) aren’t broken by helm upgrade. Flip it on in production:
What the default policy allows
Ingress:ingress-nginxnamespace → port5590(customer traffic via your Ingress).monitoringnamespace Prometheus pod → port5590/metrics.
127.0.0.1:6060 and is unaffected by NetworkPolicy because kubectl port-forward tunnels through the Kubernetes API server, not the pod network.
Egress:
- DNS to
kube-systemon TCP+UDP:53(ordering matters — DNS rule MUST come first or other rules fail to resolve hostnames). - Control plane (
langwatch-applabel selector) on port5560for/resolve-key//config//changes//budget//guardrail. Matches the langwatch chart’slangwatch-appService (app=5560, nlp=5561, langevals=5562, gateway=5563). - Redis on
:6379— only rendered whengateway.redis.urlis set. - OTLP
:4318— only rendered whengateway.networkPolicy.egressToOTLPis non-empty. - Provider upstreams on
:443— defaultany IP EXCEPT RFC1918(10/8, 172.16/12, 192.168/16). Teams with explicit compliance posture should overrideegressToProviderswith a concrete CIDR allowlist.
The “any IP except RFC1918” default is a compromise: allowing any IP locks lateral movement out of the pod (no hitting internal services by accident) while still working for OpenAI/Anthropic/Bedrock/Vertex/Gemini without operators having to pin CIDR ranges per provider. If your cluster’s CNI enforces egress strictly, you can tighten this via
egressToProviders.Verifying the policy rendered
NetworkPolicy with name langwatch-gateway. With enabled=false (the default), no NetworkPolicy object renders. The gateway CI gate asserts both invariants on every PR.
Inside the pod
To confirm the policy is active at runtime:kube-dns in kube-system, others use a custom namespace).
Graceful drain
Rolling deploys, HPA scale-downs, and voluntary pod evictions all invoke SIGTERM. The gateway’s four-phase drain makes sure in-flight requests finish and new requests route to surviving replicas:- SIGTERM received.
ctx.Done()fires. - Flip readiness + gauge.
/readyzstarts returning503 {"status":"draining"};gateway_draining{pod=...}goes to 1. - Pre-drain wait (
shutdown.preDrainWait, default 5 s). Gives the LB endpoint controller (nginx-ingress + kube-proxy) time to remove the pod from service endpoints. EKS-observed propagation is 3-4 s; 5 s has margin. During this window the pod is refusing new work via/readyzbut/livez+/startupzstay green so kubelet doesn’t short-circuit the drain. - Graceful close.
server.Shutdown(Timeout=15s)blocks until every in-flight handler exits, then force-closes. Streaming handlers that honour request context cancellation finish cleanly within the grace.
Invariant
terminationGracePeriodSeconds ≥ preDrainWait + timeout + slack. Violate this and k8s SIGKILLs the pod mid-drain. The chart defaults 5 + 15 + 10-slack = 30 which matches the template.
If your LB propagation is slower than EKS (e.g. cloud LB with 10+ second endpoint propagation), bump preDrainWait and terminationGracePeriodSeconds together:
Observability
Two gauges, always exported:gateway_draining{pod}— 0 normally, 1 during shutdown. A stuck pod (draining but never dying) shows as this being 1 for > grace.gateway_in_flight_requests{pod}— monotonic counter minus completions, so concretely the count of currently-executing handlers. During drain this should curve down to 0.
| Pattern | Meaning |
|---|---|
draining=1 + in_flight monotonically decreasing to 0 | healthy drain — nothing to do |
draining=1 + in_flight flat for > grace | stuck handler — upstream hang or a breaker without a deadline. Pod is about to SIGKILL |
Security
Edge protection lives under a dedicatedsecurity stanza separate from per-upstream budgets or the debit outbox. First (and currently only) knob is the request body cap:
budget or upstream? These three concerns fail in different directions and alert differently:
security— edge, fails fast with 413, operator bumps when large base64 images are legitimate;budget— accounting, fails async via outbox, operator investigates debit flow (see Recipe 6);upstream— per-provider, fails mid-flight with circuit-breaker trips.
%d (the chart’s test render catches a YAML scientific-notation gotcha — Go’s ParseInt rejects 1.048576e+07). If you override this value in a custom values.yaml, write integer literals, not floats.
When to tune
- Lower to ~1 MiB if you’re fronting a tightly-scoped internal API with known small payloads and want to reject bot scans at the cheapest possible cost.
- Raise to ~50 MiB when a customer’s workload includes large base64-encoded images in
visionmessages. You’ll know you need it when legitimate requests start returning413 payload_too_large; the troubleshooting entry in Troubleshooting tells users exactly which env var to bump. - Do not disable. A body cap is the single cheapest defence against drive-by memory pressure on a public endpoint.
HTTP server timeouts
Slowloris-style attacks complete TLS + headers, then trickle the body at 1 B/s holding the handler goroutine indefinitely. Three explicit timeouts on thehttp.Server close the gap:
writeTimeoutis NOT exposed. It would bound the whole response, which terminates long SSE streams (reasoning models, Claude thinking traces) at the cap. Per-chunk streaming deadlines live in the dispatcher, not here. If you’re behind an ingress that enforces its ownproxy_read_timeout, match it there instead.idleTimeoutMUST exceed your nginx-ingresskeepalive_timeout(typical 75 s). If the gateway closes first, nginx’s pool has dead sockets and a request fails mid-flight. The chart default 120 s is safe under the typical 75 s nginx value; if you run a custom ingress with a longer keepalive, bump this value proportionally.
Chart and data-plane CI
The upstream repo runs a dedicatedgateway-ci workflow against every PR that touches services/gateway/** or charts/gateway/** (paths-filter gated so unrelated PRs don’t trigger it). What it guarantees on the chart you’re installing:
- Go data plane:
go mod verify,go vet,go build,go test -count=1 -race ./...across all 14 internal packages. Benchmarks compile under-run=^$ -benchtime=1xso Go API drift fails fast without adding benchmark noise to PR logs. - Helm chart:
helm lint+ twohelm templaterenders — one with defaults (must render zeroNetworkPolicyobjects), one withnetworkPolicy.enabled=true+redis.url=redis://…(must render exactly one). Both invariants are asserted explicitly; violating either fails the job.
helm upgrade would have silently dropped NetworkPolicy, broken under -race, or regressed a benchmark beyond 2× baseline cannot land on main without the gate catching it first. Pin your chart version to one of the release tags and helm diff upgrade before rollout.
Upgrade procedure
Gateway and control plane are versioned in lockstep — release-please bumps bothcharts/gateway/Chart.yaml and charts/langwatch/Chart.yaml to the same version on every langwatch release. A single helm upgrade langwatch langwatch/langwatch rolls both pods together via the umbrella chart’s RollingUpdate strategy.
helm diff upgrade langwatch langwatch/langwatch -f values.yaml --version <new>to preview the change set.helm upgrade langwatch langwatch/langwatch -f values.yaml --version <new>. The umbrella’slangwatch-appand gateway Deployments roll concurrently — both use Kubernetes’ standardRollingUpdatestrategy and the chart does not enforce ordering. Safety during the overlap window is handled by the dual-key signature contract documented under Secret rotation: the new control-plane image accepts both old and new signatures, so a brief window where gateway pods on the new image talk to control-plane pods still on the old image (or vice versa) does not break authentication. If you want to enforce ordering for an extra-cautious rollout, runhelm upgrade ... --set gateway.enabled=falsefirst to roll the app alone, then re-enable the gateway in a secondhelm upgrade— but for the standard upgrade path the dual-key contract is what makes the concurrent roll safe.- Monitor
gateway_provider_duration_secondsandgateway_requests_total{status!="2xx"}for ~30 min. - Roll back via
helm rollback langwatch <revision>if anomalies appear — the gateway’s cache survives restarts and requests resume immediately. Both sub-charts roll back together.