Skip to main content

Minimum Requirements

Docker Compose (local development)

  • 4 CPU cores, 16 GB RAM, 50 GB disk
  • Suitable for evaluation and small teams (< 5 users)

Kubernetes (production)

  • Minimum 3 nodes with 4 CPU / 16 GB each
  • StorageClass that supports dynamic provisioning
  • See size profiles below for detailed per-component requirements

Component Resource Defaults

These are the default resource requests and limits from the Helm chart (values.yaml):
ComponentCPU RequestCPU LimitMemory RequestMemory LimitStorage
LangWatch App250m1000m2Gi4Gi---
LangWatch Workers250m1000m2Gi4Gi---
LangWatch NLP1000m2000m2Gi4Gi---
LangEvals1000m2000m6Gi8Gi---
PostgreSQL250m1000m512Mi1Gi20Gi
ClickHouse2 cores2 cores4Gi4Gi50Gi
Redis250m500m256Mi512Mi10Gi
Prometheus200m500m512Mi2Gi6Gi
ClickHouse auto-tunes internal parameters (memory limits, thread pools, merge settings) based on the CPU and memory you allocate. You only need to set clickhouse.cpu and clickhouse.memory.

Size Profiles

The Helm chart ships with composable overlay files in examples/overlays/. Use them with helm install -f:

Development (values-local.yaml)

For local development and small teams.
  • LangWatch App: 1 replica, 250m/1 CPU, 1Gi/3Gi memory
  • LangWatch Workers: 1 replica, 100m/500m CPU, 512Mi/1Gi memory
  • LangWatch NLP: 1 replica, 100m/500m CPU, 512Mi/1Gi memory
  • LangEvals: 1 replica, 100m/500m CPU, 512Mi/1Gi memory
  • ClickHouse: 1 CPU, 1Gi memory, 5Gi storage
  • PostgreSQL: 100m/500m CPU, 256Mi/512Mi memory, 2Gi storage
  • Redis: 50m/250m CPU, 64Mi/256Mi memory, 1Gi storage
  • Total: ~1 CPU, ~4 Gi RAM requests
# Example: helm install with dev sizing
helm install langwatch langwatch/langwatch \
  -f examples/values-local.yaml \
  --set autogen.enabled=true

Production (size-prod.yaml)

For production with single-node ClickHouse.
  • LangWatch App: 2 replicas, 500m/2 CPU, 2Gi/4Gi memory, PDB minAvailable 1
  • LangWatch Workers: 2 replicas, 500m/2 CPU, 2Gi/4Gi memory
  • LangWatch NLP: 1 replica, 1/2 CPU, 2Gi/4Gi memory
  • LangEvals: 1 replica, 1/2 CPU, 4Gi/8Gi memory
  • ClickHouse: 4 CPU, 8Gi memory, 100Gi storage
  • PostgreSQL: 20Gi storage
  • Redis: 5Gi storage
  • Prometheus: 30d retention, 20Gi storage
  • Total: ~12 CPU, ~28 Gi RAM requests
helm install langwatch langwatch/langwatch \
  -f examples/overlays/size-prod.yaml \
  -f examples/overlays/access-ingress.yaml

High Availability (size-ha.yaml)

For production with replicated ClickHouse.
  • LangWatch App: 3 replicas, 1/2 CPU, 4Gi/4Gi memory, PDB minAvailable 2
  • LangWatch Workers: 3 replicas, 1/2 CPU, 4Gi/4Gi memory, PDB minAvailable 2
  • LangWatch NLP: 2 replicas, 1/2 CPU, 2Gi/4Gi memory
  • LangEvals: 2 replicas, 1/2 CPU, 4Gi/8Gi memory
  • ClickHouse: 3 nodes, 4 CPU, 16Gi memory, 300Gi storage each
  • PostgreSQL: 50Gi storage
  • Redis: 10Gi storage
  • Prometheus: 60d retention, 50Gi storage
  • Total: ~25 CPU, ~70 Gi RAM requests (plus 3x ClickHouse)
helm install langwatch langwatch/langwatch \
  -f examples/overlays/size-ha.yaml \
  -f examples/overlays/access-ingress.yaml \
  -f examples/overlays/cold-storage-s3.yaml

Scaling Guidelines

What to scale first

BottleneckComponent to ScaleHow
Trace ingestion is slow / queue backlogLangWatch WorkersIncrease workers.replicaCount
UI is slow / many concurrent usersLangWatch AppIncrease app.replicaCount
ClickHouse queries are slowClickHouseIncrease clickhouse.cpu and clickhouse.memory
Evaluations are slowLangEvalsIncrease langevals.replicaCount
Topic clustering is slowLangWatch NLPIncrease langwatch_nlp.replicaCount

Horizontal Pod Autoscaler (HPA)

# Example HPA for workers
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: langwatch-workers
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: langwatch-workers
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Storage Sizing

ClickHouse hot storage

  • ~1 KB per span (compressed, varies with payload size)
  • 100K traces/day with avg 5 spans = ~500 MB/day = ~15 GB/month
  • 1M traces/day with avg 5 spans = ~5 GB/day = ~150 GB/month
  • Plan for 3-6 months of hot data before cold storage kicks in

ClickHouse cold storage (S3)

  • Enable with clickhouse.cold.enabled: true
  • Default TTL: 49 days (data older than this moves to S3)
  • S3 cost is typically 10-20x cheaper than SSD storage

PostgreSQL

  • Grows slowly --- metadata only (users, projects, configurations)
  • 10-20 GB is sufficient for most deployments

Redis

  • Minimal storage --- job queue and cache only
  • 1-5 GB is sufficient

Cloud Instance Recommendations

CloudGeneral NodesClickHouse NodesNotes
AWSm7g.xlarge (4 vCPU, 16 GB)r7g.2xlarge (8 vCPU, 64 GB)Graviton (ARM) for cost efficiency
GCPe2-standard-4 (4 vCPU, 16 GB)n2-highmem-8 (8 vCPU, 64 GB)
AzureStandard_D4s_v5 (4 vCPU, 16 GB)Standard_E8s_v5 (8 vCPU, 64 GB)
For ClickHouse, prioritize memory over CPU. ClickHouse benefits from large memory for caching and merge operations.