Skip to main content

Health Checks

Verify your deployment is healthy:
# App health
kubectl -n langwatch exec deploy/langwatch-app -- curl -s http://localhost:5560/api/health

# Worker health
kubectl -n langwatch exec deploy/langwatch-workers -- curl -s http://localhost:2999/healthz

# Pod status
kubectl -n langwatch get pods

# Recent events
kubectl -n langwatch get events --sort-by='.lastTimestamp' | tail -20

Docker Compose Issues

Port 5560 already in use

Another process is using port 5560. Find and stop it:
lsof -i :5560
# Then either stop that process or change the port in compose.yml

Containers keep restarting

Check logs for the failing container:
docker compose logs app --tail=50
docker compose logs postgres --tail=50
Common causes:
  • PostgreSQL not ready before app starts (health checks should handle this)
  • Missing or invalid .env file
  • Insufficient Docker memory (increase to 8+ GB in Docker Desktop settings)

Slow startup

First startup is slower because:
  • Docker pulls all images
  • PostgreSQL runs initial migrations
  • OpenSearch initializes its cluster
Subsequent starts are faster. If it remains slow, check Docker resource allocation.

Kubernetes / Helm Issues

Pods stuck in CrashLoopBackOff

# Check pod logs
kubectl -n langwatch logs <pod-name> --previous

# Common causes:
# 1. Database connection failed — check DATABASE_URL secret
# 2. Missing secrets — check autogen.enabled or secrets.existingSecret
# 3. ClickHouse not ready — check clickhouse pod status

Pods stuck in Pending

# Check events for the pod
kubectl -n langwatch describe pod <pod-name>

# Common causes:
# 1. Insufficient cluster resources (CPU/memory)
# 2. No StorageClass available for PVC provisioning
# 3. Node selector/affinity mismatch

PVC stuck in Pending

kubectl -n langwatch get pvc
kubectl -n langwatch describe pvc <pvc-name>
Ensure your cluster has a default StorageClass:
kubectl get storageclass
If not, set one in your values:
clickhouse:
  storage:
    storageClass: "gp3"  # or your available StorageClass

Ingress not routing traffic

# Check ingress resource
kubectl -n langwatch get ingress
kubectl -n langwatch describe ingress <ingress-name>

# Verify the ingress controller is running
kubectl get pods -n ingress-nginx  # or your ingress namespace
Ensure app.http.baseHost and app.http.publicUrl match the Ingress host.

Istio / Service Mesh

CronJob pods may hang after completion because the Istio sidecar keeps the pod alive. Fix: disable sidecar injection for CronJobs:
cronjobs:
  pod:
    annotations:
      sidecar.istio.io/inject: "false"

ClickHouse Issues

ClickHouse OOM kills

Increase ClickHouse memory:
clickhouse:
  memory: "16Gi"  # Up from default 4Gi
The subchart auto-tunes internal memory limits based on this value.

ClickHouse connection errors

# Check ClickHouse pod status
kubectl -n langwatch get pods -l app.kubernetes.io/component=clickhouse

# Test connectivity from app pod
kubectl -n langwatch exec deploy/langwatch-app -- \
  curl -s "http://langwatch-clickhouse:8123/?query=SELECT%201"

Cold storage not working

Verify S3 credentials and bucket access:
# Check ClickHouse logs for S3 errors
kubectl -n langwatch logs sts/langwatch-clickhouse --tail=50 | grep -i s3
Ensure the service account has S3 access (IRSA) or static credentials are configured correctly.

PostgreSQL Issues

Migration failures on startup

If Prisma migrations fail, the app pod will crash. Check logs:
kubectl -n langwatch logs deploy/langwatch-app --tail=100 | grep -i prisma
To skip migrations temporarily (for debugging):
app:
  extraEnvs:
    - name: SKIP_PRISMA_MIGRATE
      value: "true"
Only skip migrations for debugging. Running with pending migrations can cause application errors.

Connection refused

Verify the connection string:
# For chart-managed PostgreSQL
kubectl -n langwatch exec deploy/langwatch-postgresql -- \
  pg_isready -U postgres

# For external PostgreSQL, test from the app pod
kubectl -n langwatch exec deploy/langwatch-app -- \
  curl -v telnet://your-rds-host:5432

Authentication Issues

SSO callback URL mismatch

The callback URL configured in your identity provider must exactly match:
https://your-langwatch-domain.com/api/auth/callback/{provider}
Check that app.http.publicUrl matches your actual domain (including https://).

”Email already exists” during SSO migration

This happens when a user already has an email/password account. Follow the SSO migration steps to link existing accounts.

Sessions expire too quickly

NEXTAUTH_SECRET may have changed between deployments. Ensure it’s stored persistently in a Kubernetes Secret, not auto-generated.

Debugging Tools

Grafana Dashboards

LangWatch ships with off-the-shelf Grafana dashboards for monitoring the platform — including trace throughput, worker queue depth, ClickHouse performance, and error rates. See Observability & Monitoring for setup.

Skynet (Internal Event Debugger)

LangWatch includes Skynet, an internal event debugging tool that lets you inspect the event sourcing pipeline in real-time — view individual events, trace processing steps, and diagnose pipeline issues.

FAQ

How much disk space does ClickHouse need?

Roughly 1 KB per span (compressed). See Sizing & Scaling for detailed estimates.

Can I use an existing PostgreSQL / Redis?

Yes. Use the external database overlays:
helm install langwatch langwatch/langwatch \
  -f examples/overlays/postgres-external.yaml \
  -f examples/overlays/redis-external.yaml
See Kubernetes (Helm) for full instructions.

Can I run without LangEvals or NLP?

Yes. These services are optional. If you don’t need built-in evaluators or NLP features, you can scale them to zero:
langwatch_nlp:
  replicaCount: 0
langevals:
  replicaCount: 0

How do I disable telemetry?

app:
  telemetry:
    usage:
      enabled: false
Or set DISABLE_USAGE_STATS=true.

What ports need to be open?

Only port 443 (HTTPS) for the Ingress/Load Balancer. All other communication is internal to the cluster. See Security for the full port matrix.

Can I run LangWatch in an air-gapped environment?

Yes. Mirror the Docker images to your private registry and configure the Helm chart to pull from there. See Docker Images.

How do I check the LangWatch version?

# Helm chart version and app version
helm list -n langwatch

# Image version running in pods
kubectl -n langwatch get pods -o jsonpath='{.items[*].spec.containers[*].image}'