Kubernetes (Helm)

Deploy LangWatch on any Kubernetes cluster using the official Helm chart. The chart supports everything from single-node development to highly-available production with replicated ClickHouse.

Prerequisites

Kubernetes 1.28+
Helm 3.12+
kubectl configured for your cluster
A StorageClass that supports dynamic provisioning (for persistent volumes)
A domain name (for Ingress with TLS)
Default resource requirements: ~6 CPU and ~18 Gi RAM (requests). See Size Overlays for smaller or larger configurations.

Quick Start

Deploy LangWatch with all dependencies managed by the chart:

# Add the Helm repository
helm repo add langwatch https://langwatch.github.io/langwatch
helm repo update

# Install with auto-generated secrets (development only)
helm install langwatch langwatch/langwatch \
  --namespace langwatch --create-namespace \
  --set autogen.enabled=true \
  --wait --timeout 10m

Verify the installation:

kubectl -n langwatch get pods

Port-forward to access the UI:

kubectl -n langwatch port-forward svc/langwatch-app 5560:5560
# Open http://localhost:5560

autogen.enabled=true generates random secrets on each install. This is fine for testing but not for production, secrets will change on reinstall and invalidate sessions. See Production Deployment below.

Low-Resources Deployment

The default install requests ~6 CPU and ~18 Gi RAM. For smaller clusters or evaluation purposes, use the dev overlay which requests approximately ~2 CPU and ~4 Gi RAM:

curl -sLO https://raw.githubusercontent.com/langwatch/langwatch/main/charts/langwatch/examples/overlays/size-dev.yaml

helm install langwatch langwatch/langwatch \
  --namespace langwatch --create-namespace \
  --set autogen.enabled=true \
  -f size-dev.yaml \
  --wait --timeout 10m

This configures smaller resource limits, single replicas, and disables evaluator preloading to reduce memory usage. Suitable for development, demos, and small teams.

Production Deployment

For production, you should:

Use external managed databases (PostgreSQL, Redis)
Create Kubernetes Secrets manually
Expose via Ingress with TLS
Disable auto-generation

1. Create the app Secret

One Secret holds every chart-required value (app keys + AI gateway shared-auth keys when the gateway sub-chart is enabled). Both langwatch-app and the gateway pod mount it.

kubectl create namespace langwatch

kubectl -n langwatch create secret generic langwatch-app-secrets \
  --from-literal=credentialsEncryptionKey="$(openssl rand -hex 32)" \
  --from-literal=cronApiKey="$(openssl rand -hex 32)" \
  --from-literal=nextAuthSecret="$(openssl rand -hex 32)" \
  --from-literal=virtualKeyPepper="$(openssl rand -hex 32)" \
  --from-literal=LW_GATEWAY_INTERNAL_SECRET="$(openssl rand -hex 32)" \
  --from-literal=LW_GATEWAY_JWT_SECRET="$(openssl rand -hex 32)"

If you run with gateway.chartManaged: false (no AI gateway proxy), skip the two LW_GATEWAY_* lines. For external databases, create additional secrets:

# PostgreSQL (RDS, Cloud SQL, etc.)
kubectl create secret generic langwatch-db \
  --namespace langwatch \
  --from-literal=connectionString="postgresql://user:password@host:5432/langwatch"

# Redis (ElastiCache, Memorystore, etc.)
kubectl create secret generic langwatch-redis \
  --namespace langwatch \
  --from-literal=connectionString="redis://:password@host:6379"

2. Create a Values File

Start from the production example and customize. This configuration requests approximately ~8.5 CPU and ~28 Gi RAM across all pods:

# values-production.yaml

autogen:
  enabled: false

secrets:
  existingSecret: langwatch-app-secrets

app:
  replicaCount: 2
  http:
    baseHost: "https://langwatch.example.com"
    publicUrl: "https://langwatch.example.com"
  resources:
    requests: { cpu: 500m, memory: 4Gi }
    limits: { cpu: 1000m, memory: 4Gi }
  podDisruptionBudget:
    minAvailable: 1

workers:
  enabled: true
  replicaCount: 2
  resources:
    requests: { cpu: 500m, memory: 4Gi }
    limits: { cpu: 1000m, memory: 4Gi }
  podDisruptionBudget:
    minAvailable: 1

# External PostgreSQL
postgresql:
  chartManaged: false
  external:
    connectionString:
      secretKeyRef:
        name: langwatch-db
        key: connectionString

# External Redis
redis:
  chartManaged: false
  external:
    connectionString:
      secretKeyRef:
        name: langwatch-redis
        key: connectionString

# Chart-managed ClickHouse (production sizing)
clickhouse:
  cpu: 4
  memory: "8Gi"
  storage:
    size: 100Gi

# Ingress with TLS
ingress:
  enabled: true
  className: nginx
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
  hosts:
    - host: langwatch.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
  tls:
    - secretName: langwatch-tls
      hosts:
        - langwatch.example.com

# Prometheus monitoring
prometheus:
  chartManaged: true
  server:
    retention: 30d
    persistentVolume:
      size: 20Gi

3. Install

helm install langwatch langwatch/langwatch \
  --namespace langwatch \
  -f values-production.yaml \
  --wait --timeout 10m

4. Verify

# Check all pods are running
kubectl -n langwatch get pods

# Check ingress
kubectl -n langwatch get ingress

# Check logs
kubectl -n langwatch logs deploy/langwatch-app --tail=50

High-Availability Deployment

For HA with replicated ClickHouse, multiple app/worker replicas, and PodDisruptionBudgets. This configuration requests approximately ~36 CPU and ~84 Gi RAM across all pods:

# values-ha.yaml (extends production values above)

app:
  replicaCount: 3
  podDisruptionBudget:
    minAvailable: 2

workers:
  replicaCount: 3
  podDisruptionBudget:
    minAvailable: 2

langwatch_nlp:
  replicaCount: 2
  podDisruptionBudget:
    minAvailable: 1

langevals:
  replicaCount: 2
  podDisruptionBudget:
    minAvailable: 1

# 3-node replicated ClickHouse with Keeper
clickhouse:
  replicas: 3
  cpu: 8
  memory: "16Gi"
  storage:
    size: 300Gi
    storageClass: gp3

  # Cold storage and backups
  objectStorage:
    bucket: "langwatch-data"
    region: "us-east-1"
    useEnvironmentCredentials: true
  cold:
    enabled: true
    defaultTtlDays: 49  # Recommend multiples of 7 to align with weekly partition boundaries
  backup:
    enabled: true

postgresql:
  chartManaged: false
  external:
    connectionString:
      secretKeyRef:
        name: langwatch-db
        key: connectionString

redis:
  chartManaged: false
  external:
    connectionString:
      secretKeyRef:
        name: langwatch-redis
        key: connectionString

helm install langwatch langwatch/langwatch \
  --namespace langwatch \
  -f values-ha.yaml \
  --wait --timeout 15m

Replicated ClickHouse requires an odd number of replicas (3, 5, 7) for Keeper consensus. 3 replicas is recommended for most deployments.

Overlay System

The chart ships with composable overlay files in examples/overlays/. Combine them to build your deployment configuration:

Size Overlays

Overlay	Use Case	Approx Resources (requests)
(default, no overlay)	Quick start, small production	~6 CPU, ~18 Gi
`size-dev.yaml`	Local dev, small teams	~2 CPU, ~4 Gi
`size-prod.yaml`	Production, single-node CH	~12 CPU, ~28 Gi
`size-ha.yaml`	HA production, replicated CH	~25 CPU, ~70 Gi

Access Overlays

Overlay	Description
`access-nodeport.yaml`	NodePort on 30560 (Kind, bare-metal)
`access-ingress.yaml`	Nginx Ingress with TLS template

Infrastructure Overlays

Overlay	Description
`postgres-external.yaml`	External PostgreSQL (RDS, Cloud SQL)
`redis-external.yaml`	External Redis (ElastiCache, Memorystore)
`clickhouse-external.yaml`	External ClickHouse instance
`clickhouse-replicated.yaml`	3-node replicated ClickHouse
`cold-storage-s3.yaml`	S3 cold storage + backups
`local-images.yaml`	Local images with `pullPolicy: Never`

Composing Overlays

Overlays are composable, later files override earlier ones:

# Production with external DBs and S3 cold storage
helm install langwatch langwatch/langwatch \
  -f examples/overlays/size-prod.yaml \
  -f examples/overlays/access-ingress.yaml \
  -f examples/overlays/postgres-external.yaml \
  -f examples/overlays/redis-external.yaml \
  -f examples/overlays/cold-storage-s3.yaml \
  --set autogen.enabled=true

ClickHouse Configuration

Standalone vs Replicated

Mode	Replicas	Engine	When to Use
Standalone	1	MergeTree	Development, small production
Replicated	3+ (odd)	ReplicatedMergeTree + Keeper	HA production

Switch to replicated mode:

clickhouse:
  replicas: 3  # Automatically uses ReplicatedMergeTree + Keeper

External ClickHouse

To use an existing ClickHouse instance:

clickhouse:
  chartManaged: false
  external:
    url:
      value: "http://user:password@clickhouse-host:8123/langwatch"
    # For replicated instances:
    clusterName: "my_cluster"

Auto-Tuning

The clickhouse-serverless subchart automatically tunes ClickHouse parameters based on the CPU and memory you allocate:

clickhouse:
  cpu: 4        # Tunes thread pools, merge concurrency
  memory: "8Gi" # Tunes memory limits, cache sizes, per-query limits

You only need to set these two values, the subchart computes optimal settings for query limits, merge threads, insert batching, and S3 download parallelism.

AI Gateway sub-chart (optional)

The umbrella chart bundles the AI Gateway as an opt-in sub-chart that runs alongside the core LangWatch app. Enabling it gives you virtual keys, hierarchical budgets, multi-provider routing via Bifrost, guardrails, and prompt caching, all governed by the same control plane. Minimum viable opt-in:

gateway:
  enabled: true

That ships sane defaults (2 replicas, ClusterIP service, no ingress). For per-environment tuning (replicas, autoscaling, ingress hostname + TLS, image registry mirror, secrets injection) see AI Gateway → Self-hosting → Helm. Three things to know before flipping it on:

One Secret holds everything (or use autogen.enabled=true and let the chart materialise it). Both langwatch-app and the AI gateway pod mount LW_GATEWAY_INTERNAL_SECRET + LW_GATEWAY_JWT_SECRET from the same secrets.existingSecret Secret (default langwatch-app-secrets) that also holds the app keys (credentialsEncryptionKey, cronApiKey, nextAuthSecret, virtualKeyPepper). The Production Deployment section above shows the single kubectl create secret generic command. With autogen.enabled=false and any required key missing, the preflight Job aborts the install with a clear list of what is missing; under autogen.enabled=true the chart materialises the Secret on first install via lookup-or-rand.
Public ingress needs DNS + TLS. The gateway is what your LLM clients hit, so it usually wants its own hostname (e.g. gateway.your-corp.com), separate cert, separate ingress rule. See AI Gateway → Self-hosting → DNS & TLS.
Worker pods must be running. Budget enforcement reads from a ClickHouse rollup that the trace-processing reactor folds into. If you deploy with workers.enabled=false, budgets stop accumulating spend and breach enforcement silently degrades. The default workers.enabled=true is correct for production.

Upgrade

helm repo update
helm upgrade langwatch langwatch/langwatch \
  --namespace langwatch \
  -f values-production.yaml \
  --wait --timeout 10m

Database migrations run automatically on startup. Set SKIP_PRISMA_MIGRATE=true to disable PostgreSQL migrations if needed. See Upgrade Guide for version-specific instructions.

Uninstall

helm uninstall langwatch --namespace langwatch

This does not delete PersistentVolumeClaims. Your data in PostgreSQL, ClickHouse, and Redis PVCs is preserved. Delete them manually if you want a clean removal:

kubectl -n langwatch delete pvc --all

FAQ

Istio, Service Mesh

If you’re using Istio or another service mesh with automatic sidecar injection, the CronJob pods may fail because the sidecar keeps the pod alive after the job completes. Disable sidecar injection for CronJobs:

cronjobs:
  pod:
    annotations:
      sidecar.istio.io/inject: "false"

Custom StorageClass

Set a StorageClass for all persistent volumes:

clickhouse:
  storage:
    storageClass: "gp3"
postgresql:
  primary:
    persistence:
      storageClass: "gp3"
redis:
  master:
    persistence:
      storageClass: "gp3"

Air-Gapped Environments

For clusters without internet access:

Push LangWatch images to your private registry
Update images.app.repository, images.langwatch_nlp.repository, images.langevals.repository
Set imagePullSecrets if your registry requires authentication

Overview

Configuration

Deployment

Infrastructure

Operations

Ops Console

Prerequisites

Quick Start

Low-Resources Deployment

Production Deployment

1. Create the app Secret

2. Create a Values File

3. Install

4. Verify

High-Availability Deployment

Overlay System

Size Overlays

Access Overlays

Infrastructure Overlays

Composing Overlays

ClickHouse Configuration

Standalone vs Replicated

External ClickHouse

Auto-Tuning

AI Gateway sub-chart (optional)

Upgrade

Uninstall

FAQ

Istio, Service Mesh

Custom StorageClass

Air-Gapped Environments

​Prerequisites

​Quick Start

​Low-Resources Deployment

​Production Deployment

​1. Create the app Secret

​2. Create a Values File

​3. Install

​4. Verify

​High-Availability Deployment

​Overlay System

​Size Overlays

​Access Overlays

​Infrastructure Overlays

​Composing Overlays

​ClickHouse Configuration

​Standalone vs Replicated

​External ClickHouse

​Auto-Tuning

​AI Gateway sub-chart (optional)

​Upgrade

​Uninstall

​FAQ

​Istio, Service Mesh

​Custom StorageClass

​Air-Gapped Environments

Prerequisites

Quick Start

Low-Resources Deployment

Production Deployment

1. Create the app Secret

2. Create a Values File

3. Install

4. Verify

High-Availability Deployment

Overlay System

Size Overlays

Access Overlays

Infrastructure Overlays

Composing Overlays

ClickHouse Configuration

Standalone vs Replicated

External ClickHouse

Auto-Tuning

AI Gateway sub-chart (optional)

Upgrade

Uninstall

FAQ

Istio, Service Mesh

Custom StorageClass

Air-Gapped Environments