Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

The AI Gateway ships as a sub-chart (charts/gateway/) of the LangWatch umbrella chart (charts/langwatch/). Installing the umbrella with gateway.chartManaged: true (the default) lifts the gateway pod alongside langwatch-app in the same release. The gateway runs as a separate Kubernetes Deployment (own pod, own container, own service) and reaches the LangWatch control plane for VK resolution, budget enforcement, and guardrail execution.
The gateway sub-chart’s version and appVersion are bumped in lockstep with the umbrella langwatch chart by release-please (.github/release-please-config.json . package extra-files). helm install langwatch/langwatch@3.x always pulls a matching gateway sub-chart — operators should never need to pin them separately.

Pre-create the runtime secrets

The gateway pod will not start until two values exist in a Kubernetes Secret that the chart references but does not materialise. This is the most common rollout failure, so do it before helm install:
# Default Secret name when installing the umbrella `langwatch` chart.
# (Gateway sub-chart standalone uses `gateway-runtime-secrets` instead;
# pass --set gateway.secrets.existingSecretName=... to override either.)
kubectl create secret generic langwatch-gateway-auth \
  --namespace langwatch \
  --from-literal=LW_GATEWAY_INTERNAL_SECRET="$(openssl rand -hex 32)" \
  --from-literal=LW_GATEWAY_JWT_SECRET="$(openssl rand -hex 32)"
Both values MUST also be mounted on the langwatch-app Deployment under the same env-var names — the gateway and control plane sign and verify each other’s calls byte-for-byte. The umbrella chart references one Secret name from both Deployments to keep them in sync; if you split secrets across two Kubernetes Secrets, ensure the values match. The third sensitive value, LW_VIRTUAL_KEY_PEPPER, is control-plane-only: it hashes virtual-key secrets at rest and must never leave the langwatch-app pod. The gateway never reads it. Failure mode if you skip this step: the gateway pod’s valueFrom.secretKeyRef resolves against a missing Secret and the pod loops in CreateContainerConfigError until the helm install --atomic deadline (default 8 min) trips a rollback.

Image registry and tag

The gateway image is published to Docker Hub (docker.io/langwatch/ai-gateway) by the ai-gateway matrix entry in .github/workflows/publish-docker-app.yml on every release.published event for langwatch@* tags. Each release publishes three tags: <version>, latest, and <short-sha>. In the chart, leaving image.tag: "" (the default) resolves to .Chart.AppVersion, which is bumped in lockstep with the langwatch app version. Override only when you mirror the image to an internal registry or pin to a specific sha for a hotfix:
gateway:
  image:
    repository: docker.io/langwatch/ai-gateway   # default
    tag: ""                                       # default → .Chart.AppVersion (lockstep)
    pullPolicy: IfNotPresent

Enabling the gateway

In your umbrella values.yaml:
gateway:
  chartManaged: true                              # default; flip to false to disable

  # Where the gateway reaches the control plane. The default targets the
  # langwatch-app Service in the same namespace.
  controlPlane:
    baseUrl: http://langwatch-app:5560

  # Secret you pre-created above.
  secrets:
    existingSecretName: langwatch-gateway-auth    # default for the umbrella chart
    internalSecretKey: LW_GATEWAY_INTERNAL_SECRET # key WITHIN that Secret
    jwtSecretKey: LW_GATEWAY_JWT_SECRET           # key WITHIN that Secret

  # OTLP endpoint the gateway exports spans to. Default targets the
  # control plane's /api/otel ingest, which routes spans into the
  # owning project via the langwatch.project_id span attribute.
  otel:
    endpoint: http://langwatch-app:5560/api/otel
    # Production: provide the X-Auth-Token via a pre-existing Secret.
    authExistingSecretName: ""
    authSecretKey: token

  ingress:
    enabled: true
    className: nginx
    host: gateway.your-corp.com
    path: /v1                                     # only /v1/* needs to be public
    pathType: Prefix
    tls:
      enabled: true
      secretName: gateway-tls

  # Edge protection. Cap request body before auth runs so a misconfigured
  # caller can't OOM the pod. 10 MiB fits ~3 MiB prompts plus base64 images.
  security:
    maxRequestBodyBytes: 10485760                 # 10 MiB
The full set of overridable fields lives in charts/gateway/values.yaml (canonical for everything not exposed at the umbrella surface). All defaults are production-safe; you should only need to override controlPlane.baseUrl, secrets.existingSecretName, ingress.host, and the OTLP endpoint. Apply:
helm upgrade --install langwatch langwatch/langwatch \
  -f values.yaml \
  --namespace langwatch --create-namespace

Env vars wired by the chart

Source of truth: charts/gateway/templates/configmap.yaml (commit d14576b1e) and charts/gateway/templates/deployment.yaml. The chart wires only the env vars the Go gateway actually reads — knobs in values.yaml that don’t appear here are forward-compat surface for v1.1 (e.g. cache.bootstrapAllKeys, admin.*, redis.url) and are silently ignored by the running binary. The canonical names are derived by services/aigateway/config.go pkg/config.Hydrate, which walks the Config struct’s env:"…" tags and joins parent/child with _.
Env varSourcePurpose
SERVER_ADDRconfigmap (":5563")Public listen addr. Service targetPort matches
LOG_LEVELconfigmap (logging.level)debug / info / warn / error
LW_GATEWAY_BASE_URLconfigmap (controlPlane.baseUrl)Where the gateway reaches /api/internal/gateway/* on the control plane
OTEL_OTLP_ENDPOINTconfigmap (otel.endpoint, only when set)OTLP/HTTP ingest URL
SERVER_MAX_REQUEST_BODY_BYTESconfigmap (security.maxRequestBodyBytes)Body-size cap, rendered as %d int
LW_GATEWAY_INTERNAL_SECRETSecret secrets.existingSecretNameHMAC for /api/internal/gateway/* calls
LW_GATEWAY_JWT_SECRETSecret secrets.existingSecretNameHS256 for resolve-key JWT
LW_GATEWAY_JWT_SECRET_PREVIOUSSecret (only when secrets.jwtSecretPreviousKey is set)Retired key during rotation overlap
The deployment template intentionally does not inject GATEWAY_REDIS_URL, GATEWAY_ADMIN_AUTH_TOKEN, or GATEWAY_OTEL_DEFAULT_AUTH_TOKEN — the v1 gateway has no Redis client, admin/pprof listener, or OTLP auth-token field on its config struct, and silently dropping them at the env-var layer would mislead operators. OTLP authentication for self-hosters is delivered via OTEL_OTLP_HEADERS (e.g. Authorization=Bearer …) on the OTel config struct in pkg/config/otel.go. See charts/gateway/templates/deployment.yaml for the rationale comment.

Secret rotation

Rotate LW_GATEWAY_JWT_SECRET without downtime using the dual-key overlap pattern. The chart conditionally injects a second secretKeyRef from the same Secret when jwtSecretPreviousKey is set:
gateway:
  secrets:
    existingSecretName: langwatch-gateway-auth
    jwtSecretKey: LW_GATEWAY_JWT_SECRET
    # Rotation window only: which key in the same Secret holds the retired
    # value. Empty = strict single-key mode (production steady state).
    jwtSecretPreviousKey: LW_GATEWAY_JWT_SECRET_PREVIOUS
When jwtSecretPreviousKey is non-empty, the deployment renders a second env var (LW_GATEWAY_JWT_SECRET_PREVIOUS) and the gateway’s resolver verifies tokens against either value. When empty, no second env var is set and the gateway runs strict. 4-step zero-downtime flow:
  1. Flip the control plane’s signing secret to the new value (rotate LW_GATEWAY_JWT_SECRET on the langwatch-app Deployment first — the issuer must accept the new value before the gateway starts seeing it on inbound JWTs).
  2. Update the langwatch-gateway-auth Secret: keep LW_GATEWAY_JWT_SECRET pointing at the new value, add LW_GATEWAY_JWT_SECRET_PREVIOUS carrying the old value. helm upgrade with gateway.secrets.jwtSecretPreviousKey: LW_GATEWAY_JWT_SECRET_PREVIOUS.
  3. Rolling restart picks up both keys. Tokens signed under the old key verify against LW_GATEWAY_JWT_SECRET_PREVIOUS; new tokens use LW_GATEWAY_JWT_SECRET.
  4. After ~15 min (longest pre-rotation bundle TTL), remove LW_GATEWAY_JWT_SECRET_PREVIOUS from the Secret and unset jwtSecretPreviousKey. Rolling restart. Strict mode resumes.
The gateway emits a jwt_secret_rotation_active WARN log on boot whenever the previous-key env var is set — operators should monitor for runaway rotation windows (accepting a retired key indefinitely is a security-posture regression):
kubectl logs -n langwatch deploy/langwatch-gateway | grep jwt_secret_rotation_active
LW_GATEWAY_INTERNAL_SECRET rotation uses the same Secret slot under a previous key — the HMAC verifier accepts both during the overlap window. Rotate both together in one operation when possible. See Config → Secret rotation for the env-var contract.

Dependencies on the control plane

Two control-plane components must be running for the gateway to function fully:
  • langwatch-app on gateway.controlPlane.baseUrl (default http://langwatch-app:5560). The gateway hits /api/internal/gateway/resolve-key, /changes, and /budget/check synchronously. If unreachable, the gateway extends its cached entries via stale-while-error (see LW_GATEWAY_AUTH_CACHE_SOFT_BUMP / HARD_GRACE) but new VKs cannot resolve and rate limits cannot tighten.
  • langwatch-workers running the trace-processing pipeline. Budget enforcement depends on this. The gateway emits OTLP spans with cost attrs but does NOT debit budgets directly — the workers’ trace-fold reactor (gatewayBudgetSync.reactor.ts) reads finalised spans, computes cost, and inserts ledger rows into ClickHouse. The gateway_budget_scope_totals_mv materialized view aggregates spend, and /budget/check reads from that view. If workers.enabled: false, gateway traces still flow but spend never accumulates against budgets, so hard-cap scopes will not block at the configured limit. Self-hosters running the umbrella chart get workers by default; standalone charts/gateway installs MUST point at a control plane that has the worker pipeline running.

Networking

The gateway needs to reach:
  • The LangWatch control plane (gateway.controlPlane.baseUrlLW_GATEWAY_BASE_URL) for /api/internal/gateway/* calls.
  • Outbound to each provider API (OpenAI, Anthropic, Azure OpenAI, Bedrock, Vertex, Gemini). NetworkPolicy must allow egress (see below).
Clients reach the gateway via the Ingress (gateway.your-corp.com) on port 443. The chart hardcodes the path to /v1 so only the OpenAI-compatible / Anthropic / Gemini routes are publicly reachable; /healthz, /readyz, /metrics, and /debug/pprof/* are NEVER exposed via the public Ingress. See DNS and TLS for the full DNS/cert flow.

Scaling

See Scaling for HPA configuration, replica sizing, and Redis L2 cache guidance. Quick summary: the gateway is stateless and horizontally scalable; the chart’s HPA defaults to 2-10 replicas at 70% CPU plus a custom lw_gateway_rps metric (requires Prometheus-adapter). In-flight streaming requests pin to a single replica, so HPA should scale on smoothed signals rather than chasing spikes.

High availability

  • Run ≥2 replicas with a PodDisruptionBudget (chart default minAvailable: 1) so rolling updates don’t remove every pod simultaneously.
  • The auth-cache stale-while-error grace window (LW_GATEWAY_AUTH_CACHE_SOFT_BUMP / HARD_GRACE) keeps replicas serving with cached VK bundles when the control plane is briefly offline. See Auth-cache stale-while-error for the soft/hard-grace knobs.
  • The Redis L2 cache (gateway.redis.url) is forward-compat surface today — the v1 gateway has no Redis client, so this knob is a no-op. v1.1 will wire it for replica-warm-up on HPA scale-out. Don’t rely on it for cold-start coverage in v1; the auth-cache grace window is the only operational backstop.

Monitoring

The gateway binary exposes Prometheus metrics at /metrics (port 5563). For the metric names + labels, see Observability.
The chart does not currently render a ServiceMonitor object. Scrape via your own ServiceMonitor pointed at the gateway Service (port 80, target 5563), or via Pod-level annotations if your Prometheus uses the legacy annotation-based discovery. Chart-managed ServiceMonitor is on the v1.1 roadmap.

Probes

The chart wires three Kubernetes probes against the gateway’s /healthz (liveness), /readyz (readiness), and /startupz (startup) endpoints, all on container port 5563. See Health checks for the canonical reference: probe semantics, cadence tuning, what each endpoint dials, and how to interpret a flapping /readyz during a rollout.

Startup network check

When NetworkPolicy is enabled, a subtly-broken egress rule — typically a missing DNS entry or a provider CIDR typo — passes /livez and /healthz (which never dial outbound) but causes the first customer request to fail. The startup netcheck closes that gap by running a one-shot DNS-resolve + TCP-dial against a configured host list before MarkStarted fires, failing the pod at deploy time rather than first request.
gateway:
  startup:
    # Leave empty to disable (default, backwards-compatible).
    # Comma-separated host:port list (same wire format as GATEWAY_STARTUP_NETCHECK_HOSTS).
    netcheckHosts:
      - api.openai.com:443
      - api.anthropic.com:443
      - generativelanguage.googleapis.com:443
    netcheckTimeout: 2s
Failure-class distinction (the key operational value): the probe separates DNS resolution failures from TCP dial failures in the error message. If a host fails with dns resolution failed: ..., the broken rule is DNS egress — almost always the kube-system :53 rule in your NetworkPolicy. If it fails with tcp dial failed: ..., DNS worked but the provider :443 egress rule is broken (wrong CIDR, wrong port, or a missing egressToProviders override). Operators can diagnose deploy failures without running tcpdump in the pod.
# Grep the pod logs after a failed rollout
kubectl logs -n langwatch deploy/langwatch-gateway | grep -E 'dns resolution failed|tcp dial failed'
The probe is disabled by default (empty list) so greenfield and air-gapped-via-forward-proxy deploys are unaffected. helm upgrade on an existing deploy without overriding startup.netcheckHosts renders the same ConfigMap that earlier chart versions did.

Admin listener

v1 status: the chart’s gateway.admin.* value keys are forward-compat surface — the v1 gateway binary does not expose an admin/pprof listener and the deployment template does not inject GATEWAY_ADMIN_ADDR / GATEWAY_ADMIN_AUTH_TOKEN (see the rationale comment in charts/gateway/templates/deployment.yaml). The three postures below describe the planned v1.1 contract; setting the values today is a no-op. For live diagnostics in v1, fall back to kubectl exec + the production-runbook recipes.
The gateway’s net/http/pprof diagnostic surface (v1.1) will ship behind a dedicated listener. Three deployment postures, pick the one that matches your cluster:

Posture 1: disabled

gateway:
  admin:
    addr: ""
No pprof. Use this for compliance-regulated deployments that forbid live profiling. Forfeits the production runbook recipes — you’re back to restart-as-debugging for p99 spikes, goroutine leaks, etc.

Posture 2: loopback-only (default)

gateway:
  admin:
    addr: "127.0.0.1:6060"
    # existingAuthSecretName: <unset>   # optional, works as a defence-in-depth layer
Listener binds 127.0.0.1:6060, reachable only via kubectl port-forward. This is the default posture and appropriate for k8s deployments — kubectl port-forward tunnels through the Kubernetes API server, so every access is authenticated against your cluster’s RBAC and logged in the cluster audit trail. Supplying existingAuthSecretName in this posture is allowed and recommended as a second layer of defence — the listener still binds loopback but also requires the bearer token.

Posture 3: bearer-gated non-loopback

gateway:
  admin:
    addr: "0.0.0.0:6060"
    existingAuthSecretName: gateway-admin-token
    authSecretKey: token                # key within the Secret; default "token"
Where:
kubectl create secret generic gateway-admin-token \
  --from-literal=token="$(openssl rand -hex 32)" \
  -n langwatch
Listener binds on all interfaces and enforces Authorization: Bearer <token> on every request. Token comparison uses crypto/subtle.ConstantTimeCompare — no timing side-channel on length or prefix. The chart NEVER materialises the token into its own values or ConfigMap; only the reference to the pre-existing Secret flows through. Pick this for:
  • non-k8s deploys (systemd, Nomad, plain Docker) where kubectl port-forward isn’t available;
  • k8s deploys behind a corporate VPN or internal LB where operators want direct HTTP access without going through kubectl exec.

Safety net (v1.1)

When the admin listener ships in v1.1, the gateway’s config.validate() will run at startup and enforce:
Non-loopback GATEWAY_ADMIN_ADDR + empty GATEWAY_ADMIN_AUTH_TOKEN + unset GATEWAY_ALLOW_INSECURE ⇒ the gateway will refuse to start with an error naming both env vars.
This is the critical safety net for the v1.1 admin surface: an operator who accidentally binds the admin listener to 0.0.0.0 without a token won’t silently expose unauthenticated pprof to the pod network. The pod will crash, k8s will back off, and the boot logs will name the env variable to fix. GATEWAY_ALLOW_INSECURE=true will exist only for test harnesses — never set it in production. None of this is wired in the v1 binary today; the chart values just describe the planned shape.

Accessing pprof with the bearer token

TOKEN=$(kubectl get secret gateway-admin-token -n langwatch -o jsonpath='{.data.token}' | base64 -d)

# pprof's -source_token flag threads the Authorization header through
go tool pprof \
  -source_token "Bearer $TOKEN" \
  http://gateway.your-corp.internal:6060/debug/pprof/heap

# Or curl-based workflow
curl -sS -H "Authorization: Bearer $TOKEN" \
  http://gateway.your-corp.internal:6060/debug/pprof/goroutine?debug=1 \
  | head -100
Every RequireBearer 401 emits a clean WWW-Authenticate: Bearer realm=gateway-admin challenge. Credentials monitoring (GitHub secret-scanning, Cloudflare Secret Scanner) will pick up accidental commits of the token — rotate via the standard Secret rotation flow if that happens.

Observability

The startup log line is enriched with auth_required and loopback_only booleans so operators have an audit breadcrumb:
{"msg":"admin_listening","addr":"0.0.0.0:6060","auth_required":true,"loopback_only":false}
Grep for this in your log pipeline if you want to assert on it in a compliance check.

NetworkPolicy

The gateway chart ships an optional Kubernetes NetworkPolicy that implements deny-by-default ingress + egress on the gateway pod. It’s off by default so dev clusters (which often run without a CNI that supports NetworkPolicy) aren’t broken by helm upgrade. Flip it on in production:
gateway:
  networkPolicy:
    enabled: true

    # Optional — restrict OTLP egress to a specific namespace/pod.
    # Leave empty to skip the OTLP egress rule entirely (not exported externally).
    egressToOTLP:
      - namespaceSelector: { matchLabels: { name: observability } }
        podSelector: { matchLabels: { app: otel-collector } }

    # Optional — lock provider egress to explicit CIDRs (stricter than default).
    # Default: allow any IP EXCEPT RFC1918 ranges (see below).
    egressToProviders:
      - cidr: 104.18.0.0/16    # cloudflare (fronts most providers)
      - cidr: 52.0.0.0/8       # aws us-east-1 (Anthropic, Bedrock)

What the default policy allows

Ingress:
  • ingress-nginx namespace → port 5590 (customer traffic via your Ingress).
  • monitoring namespace Prometheus pod → port 5590 /metrics.
Everything else is rejected — including traffic from other namespaces, lateral pod-to-pod dials, and direct NodePort hits. The pprof admin listener is loopback-bound on 127.0.0.1:6060 and is unaffected by NetworkPolicy because kubectl port-forward tunnels through the Kubernetes API server, not the pod network. Egress:
  • DNS to kube-system on TCP+UDP :53 (ordering matters — DNS rule MUST come first or other rules fail to resolve hostnames).
  • Control plane (langwatch-app label selector) on port 5560 for /resolve-key / /config / /changes / /budget / /guardrail. Matches the langwatch chart’s langwatch-app Service (app=5560, nlp=5561, langevals=5562, gateway=5563).
  • Redis on :6379 — only rendered when gateway.redis.url is set.
  • OTLP :4318 — only rendered when gateway.networkPolicy.egressToOTLP is non-empty.
  • Provider upstreams on :443 — default any IP EXCEPT RFC1918 (10/8, 172.16/12, 192.168/16). Teams with explicit compliance posture should override egressToProviders with a concrete CIDR allowlist.
The “any IP except RFC1918” default is a compromise: allowing any IP locks lateral movement out of the pod (no hitting internal services by accident) while still working for OpenAI/Anthropic/Bedrock/Vertex/Gemini without operators having to pin CIDR ranges per provider. If your cluster’s CNI enforces egress strictly, you can tighten this via egressToProviders.

Verifying the policy rendered

helm template langwatch langwatch/langwatch \
  -f values.yaml \
  --set gateway.networkPolicy.enabled=true \
  | grep -A2 'kind: NetworkPolicy'
Should emit exactly one NetworkPolicy with name langwatch-gateway. With enabled=false (the default), no NetworkPolicy object renders. The gateway CI gate asserts both invariants on every PR.

Inside the pod

To confirm the policy is active at runtime:
# Ingress check — should succeed from ingress-nginx, fail from elsewhere
kubectl exec -n langwatch deploy/langwatch-gateway -- curl -sS http://localhost:5590/healthz

# Egress check — should succeed to api.openai.com, fail to a lateral pod
kubectl exec -n langwatch deploy/langwatch-gateway -- \
  curl -sS --max-time 3 https://api.openai.com/v1/models
If egress fails with DNS resolution errors, verify the kube-system selector in the chart matches your cluster’s DNS namespace (some clusters use kube-dns in kube-system, others use a custom namespace).

Graceful drain

Rolling deploys, HPA scale-downs, and voluntary pod evictions all invoke SIGTERM. The gateway’s four-phase drain makes sure in-flight requests finish and new requests route to surviving replicas:
  1. SIGTERM received. ctx.Done() fires.
  2. Flip readiness + gauge. /readyz starts returning 503 {"status":"draining"}; gateway_draining{pod=...} goes to 1.
  3. Pre-drain wait (shutdown.preDrainWait, default 5 s). Gives the LB endpoint controller (nginx-ingress + kube-proxy) time to remove the pod from service endpoints. EKS-observed propagation is 3-4 s; 5 s has margin. During this window the pod is refusing new work via /readyz but /livez + /startupz stay green so kubelet doesn’t short-circuit the drain.
  4. Graceful close. server.Shutdown(Timeout=15s) blocks until every in-flight handler exits, then force-closes. Streaming handlers that honour request context cancellation finish cleanly within the grace.
gateway:
  shutdown:
    preDrainWait: 5s
    timeout: 15s
  terminationGracePeriodSeconds: 30   # MUST be ≥ preDrainWait + timeout + slack

Invariant

terminationGracePeriodSeconds ≥ preDrainWait + timeout + slack. Violate this and k8s SIGKILLs the pod mid-drain. The chart defaults 5 + 15 + 10-slack = 30 which matches the template. If your LB propagation is slower than EKS (e.g. cloud LB with 10+ second endpoint propagation), bump preDrainWait and terminationGracePeriodSeconds together:
gateway:
  shutdown:
    preDrainWait: 15s
    timeout: 20s
  terminationGracePeriodSeconds: 45   # 15 + 20 + 10 slack

Observability

Two gauges, always exported:
  • gateway_draining{pod} — 0 normally, 1 during shutdown. A stuck pod (draining but never dying) shows as this being 1 for > grace.
  • gateway_in_flight_requests{pod} — monotonic counter minus completions, so concretely the count of currently-executing handlers. During drain this should curve down to 0.
Pairing the two during drain gives two operationally distinct patterns:
PatternMeaning
draining=1 + in_flight monotonically decreasing to 0healthy drain — nothing to do
draining=1 + in_flight flat for > gracestuck handler — upstream hang or a breaker without a deadline. Pod is about to SIGKILL
See Production runbook → Recipe 7 for the diagnostic ladder when handlers don’t exit within the grace.

Security

Edge protection lives under a dedicated security stanza separate from per-upstream budgets or the debit outbox. First (and currently only) knob is the request body cap:
gateway:
  security:
    maxRequestBodyBytes: 10485760   # 10 MiB, default
Why not budget or upstream? These three concerns fail in different directions and alert differently:
  • security — edge, fails fast with 413, operator bumps when large base64 images are legitimate;
  • budget — accounting, fails async via outbox, operator investigates debit flow (see Recipe 6);
  • upstream — per-provider, fails mid-flight with circuit-breaker trips.
The chart’s ConfigMap renders the int as %d (the chart’s test render catches a YAML scientific-notation gotcha — Go’s ParseInt rejects 1.048576e+07). If you override this value in a custom values.yaml, write integer literals, not floats.

When to tune

  • Lower to ~1 MiB if you’re fronting a tightly-scoped internal API with known small payloads and want to reject bot scans at the cheapest possible cost.
  • Raise to ~50 MiB when a customer’s workload includes large base64-encoded images in vision messages. You’ll know you need it when legitimate requests start returning 413 payload_too_large; the troubleshooting entry in Troubleshooting tells users exactly which env var to bump.
  • Do not disable. A body cap is the single cheapest defence against drive-by memory pressure on a public endpoint.

HTTP server timeouts

Slowloris-style attacks complete TLS + headers, then trickle the body at 1 B/s holding the handler goroutine indefinitely. Three explicit timeouts on the http.Server close the gap:
gateway:
  security:
    readHeaderTimeout: 5s    # TLS-handshake-then-stall probes
    readTimeout: 60s         # full body read; stops byte-trickle
    idleTimeout: 120s        # keep-alive cap; MUST exceed nginx keepalive_timeout
Notable omissions:
  • writeTimeout is NOT exposed. It would bound the whole response, which terminates long SSE streams (reasoning models, Claude thinking traces) at the cap. Per-chunk streaming deadlines live in the dispatcher, not here. If you’re behind an ingress that enforces its own proxy_read_timeout, match it there instead.
  • idleTimeout MUST exceed your nginx-ingress keepalive_timeout (typical 75 s). If the gateway closes first, nginx’s pool has dead sockets and a request fails mid-flight. The chart default 120 s is safe under the typical 75 s nginx value; if you run a custom ingress with a longer keepalive, bump this value proportionally.
All three timeouts apply to the public listener and the admin listener identically.

Chart and data-plane CI

The upstream repo runs a dedicated gateway-ci workflow against every PR that touches services/gateway/** or charts/gateway/** (paths-filter gated so unrelated PRs don’t trigger it). What it guarantees on the chart you’re installing:
  • Go data plane: go mod verify, go vet, go build, go test -count=1 -race ./... across all 14 internal packages. Benchmarks compile under -run=^$ -benchtime=1x so Go API drift fails fast without adding benchmark noise to PR logs.
  • Helm chart: helm lint + two helm template renders — one with defaults (must render zero NetworkPolicy objects), one with networkPolicy.enabled=true + redis.url=redis://… (must render exactly one). Both invariants are asserted explicitly; violating either fails the job.
Jobs are timeout-capped (15 m / 10 m) and use concurrency cancellation on force-push, so a runaway test can’t burn runner minutes. Consequence for operators: an upgrade path where helm upgrade would have silently dropped NetworkPolicy, broken under -race, or regressed a benchmark beyond 2× baseline cannot land on main without the gate catching it first. Pin your chart version to one of the release tags and helm diff upgrade before rollout.

Upgrade procedure

Gateway and control plane are versioned in lockstep — release-please bumps both charts/gateway/Chart.yaml and charts/langwatch/Chart.yaml to the same version on every langwatch release. A single helm upgrade langwatch langwatch/langwatch rolls both pods together via the umbrella chart’s RollingUpdate strategy.
  1. helm diff upgrade langwatch langwatch/langwatch -f values.yaml --version <new> to preview the change set.
  2. helm upgrade langwatch langwatch/langwatch -f values.yaml --version <new>. The umbrella’s langwatch-app and gateway Deployments roll concurrently — both use Kubernetes’ standard RollingUpdate strategy and the chart does not enforce ordering. Safety during the overlap window is handled by the dual-key signature contract documented under Secret rotation: the new control-plane image accepts both old and new signatures, so a brief window where gateway pods on the new image talk to control-plane pods still on the old image (or vice versa) does not break authentication. If you want to enforce ordering for an extra-cautious rollout, run helm upgrade ... --set gateway.enabled=false first to roll the app alone, then re-enable the gateway in a second helm upgrade — but for the standard upgrade path the dual-key contract is what makes the concurrent roll safe.
  3. Monitor gateway_provider_duration_seconds and gateway_requests_total{status!="2xx"} for ~30 min.
  4. Roll back via helm rollback langwatch <revision> if anomalies appear — the gateway’s cache survives restarts and requests resume immediately. Both sub-charts roll back together.