> ## Documentation Index
> Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Sizing & Scaling

> Resource requirements, size profiles, and scaling recommendations for LangWatch

## Minimum Requirements

### Docker Compose (local development)

* 4 CPU cores, 16 GB RAM, 50 GB disk
* Suitable for evaluation and small teams (\< 5 users)

### Kubernetes (production)

* Minimum 3 nodes with 4 CPU / 16 GB each
* StorageClass that supports dynamic provisioning
* See size profiles below for detailed per-component requirements

## Component Resource Defaults

These are the default resource requests and limits from the Helm chart (`values.yaml`):

| Component         | CPU Request | CPU Limit | Memory Request | Memory Limit | Storage |
| ----------------- | ----------- | --------- | -------------- | ------------ | ------- |
| LangWatch App     | 250m        | 1000m     | 2Gi            | 4Gi          | ---     |
| LangWatch Workers | 250m        | 1000m     | 2Gi            | 4Gi          | ---     |
| LangWatch NLP     | 1000m       | 2000m     | 2Gi            | 4Gi          | ---     |
| LangEvals         | 1000m       | 2000m     | 6Gi            | 8Gi          | ---     |
| PostgreSQL        | 250m        | 1000m     | 512Mi          | 1Gi          | 20Gi    |
| ClickHouse        | 2 cores     | 2 cores   | 4Gi            | 4Gi          | 50Gi    |
| Redis             | 250m        | 500m      | 256Mi          | 512Mi        | 10Gi    |
| Prometheus        | 200m        | 500m      | 512Mi          | 2Gi          | 6Gi     |

<Note>ClickHouse auto-tunes internal parameters (memory limits, thread pools, merge settings) based on the CPU and memory you allocate. You only need to set `clickhouse.cpu` and `clickhouse.memory`.</Note>

## Size Profiles

The Helm chart ships with composable overlay files in `examples/overlays/`. Use them with `helm install -f`:

### Development (`values-local.yaml`)

For local development and small teams.

* LangWatch App: 1 replica, 250m/1 CPU, 1Gi/3Gi memory
* LangWatch Workers: 1 replica, 100m/500m CPU, 512Mi/1Gi memory
* LangWatch NLP: 1 replica, 100m/500m CPU, 512Mi/1Gi memory
* LangEvals: 1 replica, 100m/500m CPU, 512Mi/1Gi memory
* ClickHouse: 1 CPU, 1Gi memory, 5Gi storage
* PostgreSQL: 100m/500m CPU, 256Mi/512Mi memory, 2Gi storage
* Redis: 50m/250m CPU, 64Mi/256Mi memory, 1Gi storage
* Total: \~1 CPU, \~4 Gi RAM requests

```yaml theme={null}
# Example: helm install with dev sizing
helm install langwatch langwatch/langwatch \
  -f examples/values-local.yaml \
  --set autogen.enabled=true
```

### Production (`size-prod.yaml`)

For production with single-node ClickHouse.

* LangWatch App: 2 replicas, 500m/2 CPU, 2Gi/4Gi memory, PDB minAvailable 1
* LangWatch Workers: 2 replicas, 500m/2 CPU, 2Gi/4Gi memory
* LangWatch NLP: 1 replica, 1/2 CPU, 2Gi/4Gi memory
* LangEvals: 1 replica, 1/2 CPU, 4Gi/8Gi memory
* ClickHouse: 4 CPU, 8Gi memory, 100Gi storage
* PostgreSQL: 20Gi storage
* Redis: 5Gi storage
* Prometheus: 30d retention, 20Gi storage
* Total: \~12 CPU, \~28 Gi RAM requests

```yaml theme={null}
helm install langwatch langwatch/langwatch \
  -f examples/overlays/size-prod.yaml \
  -f examples/overlays/access-ingress.yaml
```

### High Availability (`size-ha.yaml`)

For production with replicated ClickHouse.

* LangWatch App: 3 replicas, 1/2 CPU, 4Gi/4Gi memory, PDB minAvailable 2
* LangWatch Workers: 3 replicas, 1/2 CPU, 4Gi/4Gi memory, PDB minAvailable 2
* LangWatch NLP: 2 replicas, 1/2 CPU, 2Gi/4Gi memory
* LangEvals: 2 replicas, 1/2 CPU, 4Gi/8Gi memory
* ClickHouse: 3 nodes, 4 CPU, 16Gi memory, 300Gi storage each
* PostgreSQL: 50Gi storage
* Redis: 10Gi storage
* Prometheus: 60d retention, 50Gi storage
* Total: \~25 CPU, \~70 Gi RAM requests (plus 3x ClickHouse)

```yaml theme={null}
helm install langwatch langwatch/langwatch \
  -f examples/overlays/size-ha.yaml \
  -f examples/overlays/access-ingress.yaml \
  -f examples/overlays/cold-storage-s3.yaml
```

## Scaling Guidelines

### What to scale first

| Bottleneck                              | Component to Scale | How                                               |
| --------------------------------------- | ------------------ | ------------------------------------------------- |
| Trace ingestion is slow / queue backlog | LangWatch Workers  | Increase `workers.replicaCount`                   |
| UI is slow / many concurrent users      | LangWatch App      | Increase `app.replicaCount`                       |
| ClickHouse queries are slow             | ClickHouse         | Increase `clickhouse.cpu` and `clickhouse.memory` |
| Evaluations are slow                    | LangEvals          | Increase `langevals.replicaCount`                 |
| Topic clustering is slow                | LangWatch NLP      | Increase `langwatch_nlp.replicaCount`             |

### Horizontal Pod Autoscaler (HPA)

```yaml theme={null}
# Example HPA for workers
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: langwatch-workers
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: langwatch-workers
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
```

## Storage Sizing

### ClickHouse hot storage

* \~1 KB per span (compressed, varies with payload size)
* 100K traces/day with avg 5 spans = \~500 MB/day = \~15 GB/month
* 1M traces/day with avg 5 spans = \~5 GB/day = \~150 GB/month
* Plan for 3-6 months of hot data before cold storage kicks in

### ClickHouse cold storage (S3)

* Enable with `clickhouse.cold.enabled: true`
* Default TTL: 49 days (data older than this moves to S3). We recommend multiples of 7 to align with ClickHouse's weekly partition boundaries
* S3 cost is typically 10-20x cheaper than SSD storage

### PostgreSQL

* Grows slowly --- metadata only (users, projects, configurations)
* 10-20 GB is sufficient for most deployments

### Redis

* Minimal storage --- job queue and cache only
* 1-5 GB is sufficient

## Cloud Instance Recommendations

| Cloud | General Nodes                     | ClickHouse Nodes                  | Notes                              |
| ----- | --------------------------------- | --------------------------------- | ---------------------------------- |
| AWS   | m7g.xlarge (4 vCPU, 16 GB)        | r7g.2xlarge (8 vCPU, 64 GB)       | Graviton (ARM) for cost efficiency |
| GCP   | e2-standard-4 (4 vCPU, 16 GB)     | n2-highmem-8 (8 vCPU, 64 GB)      |                                    |
| Azure | Standard\_D4s\_v5 (4 vCPU, 16 GB) | Standard\_E8s\_v5 (8 vCPU, 64 GB) |                                    |

<Tip>For ClickHouse, prioritize memory over CPU. ClickHouse benefits from large memory for caching and merge operations.</Tip>
