What Changed
- Data store: Trace, span, and evaluation data is now stored in ClickHouse instead of Elasticsearch/OpenSearch
- Architecture: New event-sourcing system for data processing
- Helm charts: New composable overlay structure with
clickhouse-serverlesssubchart - Environment:
ELASTICSEARCH_*variables replaced byCLICKHOUSE_URL
Migration Steps Overview
- Back up your databases
- Deploy ClickHouse alongside your existing Elasticsearch
- Upgrade LangWatch to v3
- Migrate historical data from Elasticsearch to ClickHouse
- Remove Elasticsearch
Prerequisites
- Back up your databases — see Backups
- Check release notes at github.com/langwatch/langwatch/releases
- Test in staging before upgrading production
- Verify your Elasticsearch cluster is healthy (all shards green)
- Ensure you have enough disk space on the ClickHouse host for the migrated data
Step 1: Deploy ClickHouse
Add ClickHouse to your existing infrastructure. Your Elasticsearch instance stays running — both will operate in parallel during migration.Docker Compose
Add the ClickHouse service to yourcompose.yml:
Kubernetes (Helm)
The v3 Helm chart includes theclickhouse-serverless subchart automatically. No additional setup is needed — ClickHouse will be deployed when you upgrade the chart in Step 2.
Step 2: Upgrade LangWatch to v3
Docker Compose
Update your environment variables and pull the v3 images: AddCLICKHOUSE_URL to your app and workers environment in compose.yml:
Helm Chart
The Helm chart configures
CLICKHOUSE_URL automatically from the ClickHouse values — no manual env var needed for either managed or external ClickHouse.What Happens on Startup
- PostgreSQL: Prisma migrations run automatically, including removing the old Elasticsearch feature flags
- ClickHouse: Schema migrations create all required tables (
event_log,stored_spans,trace_summaries, etc.) - New data starts flowing to ClickHouse immediately
- Historical data in Elasticsearch remains readable until you migrate it
Keep your
ELASTICSEARCH_NODE_URL configured during this phase. LangWatch v3 can still read from Elasticsearch for data that hasn’t been migrated yet.Step 3: Migrate Historical Data
Thees-migration tool reads documents from Elasticsearch and writes them to ClickHouse via the event-sourcing system. It runs outside of your LangWatch deployment — no Redis or BullMQ needed.
Setup
Configure
Set the required environment variables:- Elasticsearch: Use a read-only user or API key. The migration tool only reads from ES — a read-only credential protects your source data from accidental writes.
- ClickHouse: Use a user with write access. The tool needs to insert into
event_logand projection tables.
Disable ClickHouse TTLs Before Migrating
LangWatch uses TTL rules to manage data retention and tiered storage in ClickHouse. By default, data older than 49 days is moved to cold storage (or dropped if cold storage isn’t configured). When you migrate historical data from Elasticsearch, much of it will be older than 49 days — so ClickHouse would try to expire or offload it the moment it lands. Before starting the migration, set the TTL to a very high value so all migrated data stays in hot storage:Docker Compose
Add to your app and workers environment:Helm Chart
After Migration: Restore TTLs
Once the migration is complete, setTIERED_STORAGE_DEFAULT_HOT_DAYS back to your desired retention value and restart LangWatch. ClickHouse handles the offloading gracefully in the background.
Migration Targets
The tool migrates data in separate targets:| Target | Description |
|---|---|
traces-combined | Traces and their evaluations |
simulations | Simulation run events |
batch-evaluations | Batch experiment evaluation data |
dspy-steps | DSPy optimization steps |
all | Run all primary targets in sequence |
Testing Workflow
Start small and verify before running the full migration: 1. Dry-run a single batch — validate the mapping without writing anything:./dry-run-traces-combined.json to confirm the data looks correct.
2. Live single batch — process one batch and verify in ClickHouse:
Recommended Tuning
Each target has different document sizes and volumes, so tuning per-target improves throughput. Traces and evaluations (large volume):Runtime Controls
- Pause/Resume: Press
pduring migration to pause after the current batch. Presspagain to resume. - Graceful shutdown:
Ctrl+Cfinishes the current batch then exits. Press again to force quit. - ClickHouse backpressure: The tool monitors ClickHouse merge load and pauses automatically when it’s too high. It resumes when merges catch up.
Resume After Interruption
The migration saves progress to a cursor file (e.g.,./cursor-traces-combined.json). If interrupted, it resumes from the last checkpoint on restart.
To start a target from scratch, delete its cursor file.
Verify the Migration
After migration completes, verify the data in ClickHouse:ProjectionId column on trace summaries as if there are any MergeTree Replacements that need to happen it could cause the count operation to report duplicates in the count.
Step 4: Remove Elasticsearch
Once you’ve verified the migration:-
Remove Elasticsearch environment variables from your configuration:
ELASTICSEARCH_NODE_URLELASTICSEARCH_API_KEY
-
Remove the Elasticsearch service from your
compose.ymlor Helm values - Restart LangWatch to apply the changes
Docker Compose
Helm
Environment Variable Changes
| Variable | v1.x / v2.x | v3.0 |
|---|---|---|
ELASTICSEARCH_NODE_URL | Required | Remove |
ELASTICSEARCH_API_KEY | Optional | Remove |
CLICKHOUSE_URL | — | Required |
Troubleshooting
Toxic documents
If a single Elasticsearch document is too large and crashes an ES shard during migration, the tool detects this, skips the problematic document, and logs its ID to./skipped-toxic-docs.log. These documents may need manual handling.
Response too large
If an Elasticsearch response exceeds the Node.js string limit (~1 GB), the tool automatically halves the batch size and retries.Transient Elasticsearch errors
For timeouts or connection issues, the tool uses exponential backoff (up to 5 retries) and reduces batch size if errors persist.Migration seems slow
- Check ClickHouse merge load in the progress output (
ch_partscolumn) - Increase
CONCURRENCYandBATCH_SIZEif your hardware can handle it - The tool auto-pauses when ClickHouse is under merge pressure — this is normal
Getting Help
- Check the Troubleshooting guide
- Open an issue at github.com/langwatch/langwatch/issues
- Contact support