> ## Documentation Index
> Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture & Infrastructure

> How LangWatch components fit together, what you're deploying and how data flows through the system

This page explains what you're deploying when you self-host LangWatch, the components, how they connect, and how data moves through the system. Understanding this will help you size, operate, and debug your deployment.

## System Overview

```mermaid theme={null}
---
config:
  layout: elk
---
architecture-beta
    service llmApp(internet)[Your LLM App]

    group platform(cloud)[Your Infrastructure]

    service app(server)[LangWatch App] in platform
    service postgres(database)[PostgreSQL] in platform
    service redis(database)[Redis BullMQ] in platform

    group eventSourcing(server)[Event Sourcing Pipeline] in platform
    service spanPipe(disk)[Ingestion and Enrichment] in eventSourcing
    service evalPipe(disk)[Evaluations and Experiments] in eventSourcing
    service reactionPipe(disk)[Reactions and Triggers] in eventSourcing

    group dataLayer(database)[Data Layer] in platform
    service clickhouse(database)[ClickHouse] in dataLayer
    service s3(disk)[S3 Cold Storage] in dataLayer

    service nlp(server)[LangWatch NLP] in platform
    service langevals(server)[LangEvals] in platform

    service llmProviders(cloud)[External LLMs]
    service notify(internet)[Slack Email Webhooks]

    llmApp:B --> T:app
    app:B --> T:redis
    app:R --> L:postgres
    redis:B --> T:spanPipe
    spanPipe:R --> L:clickhouse
    evalPipe:B --> T:langevals
    langevals:R --> L:llmProviders
    nlp:R --> L:llmProviders
    reactionPipe:B --> T:notify
    clickhouse:R --> L:s3
```

<CardGroup cols={3}>
  <Card title="API Layer" icon="globe">
    **LangWatch App** (:5560)

    Web UI, REST API, OTel ingestion, authentication, SSE real-time updates. The only externally-exposed component.
  </Card>

  <Card title="Processing" icon="microchip">
    **LangWatch Workers**

    Event sourcing pipeline via BullMQ, span ingestion, trace summarisation, cost enrichment, evaluations, PII redaction, and more.
  </Card>

  <Card title="Services" icon="flask">
    **LangWatch NLP** (:5561) and **LangEvals** (:5562)

    NLP workflows, topic clustering, built-in evaluators, guardrails. Call external LLMs for model-based operations.
  </Card>

  <Card title="Control Plane" icon="database">
    **PostgreSQL**

    Users, teams, projects, configurations, prompt versions. Managed via Prisma with auto-migrations.
  </Card>

  <Card title="Data Plane" icon="chart-bar">
    **ClickHouse**

    All traces, spans, evaluations, experiments, analytics. Hot storage on SSD, cold storage on S3. Auto-tuned via the clickhouse-serverless subchart.
  </Card>

  <Card title="Queue & Storage" icon="layer-group">
    **Redis**: BullMQ job queue, caching, sessions.

    **S3**: ClickHouse cold storage, backups, datasets.
  </Card>
</CardGroup>

## Components

### LangWatch App (port 5560)

Next.js server, the single external entry point for all traffic:

* **Web UI**: dashboards, trace explorer, prompt management, experiment views
* **REST API + OTel ingestion**: receives spans from LangWatch SDKs
* **Authentication**: NextAuth.js (email, Google, GitHub, GitLab, Azure AD, Cognito, Okta)
* **SSE**: pushes real-time updates to connected browser clients
* **Analytics queries**: reads from ClickHouse
* **Control plane**: manages users, teams, projects via PostgreSQL

### LangWatch Workers

Same `langwatch/langwatch` image, started with `pnpm start:workers`. Consumes jobs from a BullMQ queue in Redis and runs the event sourcing pipeline (see below).

* Deployed as a separate Kubernetes Deployment
* Stateless, scale by adding replicas

### LangWatch NLP (port 5561)

Python service for:

* Optimization Studio workflow execution
* Topic clustering algorithms
* Custom evaluator execution

### LangEvals (port 5562)

Built-in evaluator library (Python):

* LLM-as-a-Judge (boolean, categorical, scored)
* Safety (content safety, jailbreak detection, prompt shield)
* Quality (faithfulness, relevancy, correctness, summarization)
* RAG (context precision, context recall, context relevancy)
* Format (exact match, BLEU, ROUGE, semantic similarity)

Both NLP and LangEvals make outbound calls to external LLM providers for model-based operations.

### PostgreSQL: Control Plane

Stores users, teams, projects, configurations, prompt versions, evaluator definitions. Managed via Prisma ORM with auto-migrations on startup.

### ClickHouse: Data Plane

Stores all high-volume data: traces, spans, evaluations, experiments, analytics, and event sourcing events/projections.

| Mode       | Replicas | Engine                       | Use Case              |
| ---------- | -------- | ---------------------------- | --------------------- |
| Standalone | 1        | MergeTree                    | Dev, small production |
| Replicated | 3+ (odd) | ReplicatedMergeTree + Keeper | HA production         |

The `clickhouse-serverless` subchart auto-tunes internal parameters from two inputs: `cpu` and `memory`.

The `langwatch/clickhouse-serverless` Docker image is a performance-tweaked ClickHouse build optimized for LangWatch's traffic patterns, high-throughput event ingestion with concurrent analytical queries.

**Tiered storage**: hot data on local SSD, cold data on S3 after a configurable TTL (default 49 days). Native `BACKUP`/`RESTORE` to S3.

### Redis

* **BullMQ job queue**: connects the App to Workers with guaranteed delivery, retry, and backpressure
* **Caching**: frequently accessed config and lookup data
* **Sessions**: user session storage

### S3, Object Storage

* ClickHouse cold storage (tiered after TTL)
* ClickHouse backups (full + incremental)
* Dataset storage (optional)
* **Stored objects** — externalized byte content (audio/image/video/document) content-addressed under `{projectId}/{sha256}` and served back via `GET /api/files/:id`.

## Event Sourcing Pipeline

LangWatch v3 uses an event-sourcing model for data processing. Understanding this helps with debugging and capacity planning.

When a span arrives from your SDK, it enters a pipeline of independent steps running on the Workers:

```mermaid theme={null}
graph LR
    subgraph Ingestion
        A[Span Ingestion] --> B[Trace Summarisation]
    end

    subgraph Enrichment
        B --> C[LLM Cost Enrichment]
        B --> D[LLM Metric Processing]
        B --> E[Embedding Extraction]
        B --> F[PII Redaction]
    end

    subgraph Evaluation
        C --> G[Evaluation Execution]
        D --> G
        G --> H[Annotation Processing]
        G --> I[Experiment Processing]
    end

    subgraph Reactions
        I --> J[Generate Topics]
        I --> K[Automation / Triggers]
        I --> L[UI Update Broadcasting]
    end

    L --> M[(ClickHouse)]
    K --> M
    J --> M
```

Each step reads from the queue, does its work, and writes results to ClickHouse. Steps are independent, if one is slow (e.g., evaluation waiting on an LLM call), others continue processing.

### How Data is Organized

The pipeline produces three types of output:

**Events**: immutable records of what happened. Stored in ClickHouse and never modified.

| Event                    | Produced By           | What It Contains                       |
| ------------------------ | --------------------- | -------------------------------------- |
| SpanIngested             | Span Ingestion        | Raw span data from SDK                 |
| TraceSummarised          | Trace Summarisation   | Aggregated trace with input/output     |
| CostEnriched             | LLM Cost Enrichment   | Token costs per model                  |
| MetricsExtracted         | LLM Metric Processing | Latency, token counts, model info      |
| EmbeddingsGenerated      | Embedding Extraction  | Vector embeddings for similarity       |
| PIIRedacted              | PII Redaction         | Redacted fields and detection metadata |
| EvaluationCompleted      | Evaluation Execution  | Evaluator scores and results           |
| ExperimentResultRecorded | Experiment Processing | Run results for A/B tests              |

**Projections**: derived tables that dashboards and APIs read from. Built from events.

| Projection     | Built From                     | Used By                   |
| -------------- | ------------------------------ | ------------------------- |
| Traces         | SpanIngested + TraceSummarised | Trace explorer, search    |
| Spans          | SpanIngested                   | Span detail views         |
| Evaluations    | EvaluationCompleted            | Quality scores, monitors  |
| ExperimentRuns | ExperimentResultRecorded       | Experiment result tables  |
| Analytics      | All events                     | Dashboard aggregations    |
| Topics         | TraceSummarised + clustering   | Conversation topic groups |

**Reactions**: side effects triggered during processing.

| Reaction       | Triggered By        | Effect                             |
| -------------- | ------------------- | ---------------------------------- |
| SSE Update     | Any event           | Real-time UI refresh in browser    |
| Alert, Trigger | EvaluationCompleted | Slack, email, webhook notification |
| Dataset Append | Automation rules    | Auto-add traces to datasets        |

<Note>
  If Worker queue depth grows in Redis, it means processing is falling behind ingestion. The fix is to add more Worker replicas, each one is stateless and consumes jobs independently.
</Note>

## Data Flow

```mermaid theme={null}
sequenceDiagram
    participant App as Your LLM App
    participant LW as LangWatch App
    participant Redis as Redis (BullMQ)
    participant W as Workers
    participant CH as ClickHouse
    participant LE as LangEvals
    participant Browser as Browser (UI)

    App->>LW: Send spans (OTel / REST)
    LW->>Redis: Enqueue span ingestion job
    Redis->>W: Worker picks up job
    W->>W: Pipeline steps (enrich, evaluate, etc.)
    W->>CH: Write events + projections
    W->>LE: Request evaluation (if configured)
    LE-->>W: Evaluation scores
    W->>CH: Write evaluation results
    W->>LW: SSE broadcast
    LW->>Browser: Real-time update
```

Additionally, Kubernetes CronJobs trigger periodic tasks via HTTP on the App:

* **Topic clustering**: daily at midnight, via the NLP service
* **Alert triggers**: every 3 minutes, evaluates monitor conditions
* **Retention cleanup**: daily at 01:00, removes data past retention period

## Network Topology

Only the App is exposed externally. Everything else is cluster-internal:

| Component  | Service Type             | External |
| ---------- | ------------------------ | -------- |
| App        | Ingress, LoadBalancer    | Yes      |
| Workers    | None (no Service needed) | No       |
| NLP        | ClusterIP                | No       |
| LangEvals  | ClusterIP                | No       |
| PostgreSQL | ClusterIP                | No       |
| ClickHouse | ClusterIP                | No       |
| Redis      | ClusterIP                | No       |

<Note>
  LangEvals and NLP make outbound calls to external LLM providers (OpenAI, Azure, etc.). Ensure these pods have network egress to the relevant endpoints.
</Note>

## Docker Images

| Image                     | Port | Purpose                                          |
| ------------------------- | ---- | ------------------------------------------------ |
| `langwatch/langwatch`     | 5560 | App + Workers (same image, different entrypoint) |
| `langwatch/langwatch_nlp` | 5561 | NLP, workflows, topic clustering                 |
| `langwatch/langevals`     | 5562 | Evaluators, guardrails                           |

## OpenTelemetry Integration

LangWatch is deeply integrated with OpenTelemetry. The platform both **consumes** and **exports** telemetry data:

**Ingestion**: The LangWatch App accepts spans via the OpenTelemetry protocol (OTLP over HTTP). Any OTel-instrumented application can send traces to LangWatch without a vendor-specific SDK.

**Export**: LangWatch exports its own operational metrics, logs, and traces via OpenTelemetry for infrastructure debugging:

* **Metrics**: Prometheus-compatible metrics from the App and Workers (request latency, queue depth, error rates)
* **Logs**: Structured application logs from all components
* **Traces**: Distributed traces of internal request processing

This means you can monitor LangWatch itself using the same observability stack you use for the rest of your infrastructure, Grafana, Datadog, New Relic, or any OTel-compatible backend.

LangWatch ships with off-the-shelf Grafana dashboards for monitoring the platform. See [Observability & Monitoring](/docs/self-hosting/configuration/observability) for setup details.

## Deployment Models

### Self-Managed

Everything on your infrastructure. You deploy the Helm chart and manage all components.

### Cloud Enterprise

LangWatch manages the control plane in a dedicated, single-tenant environment. Exclusive data instances in your preferred region.

### Hybrid (Bring Your Own Storage)

LangWatch manages compute (App, Workers, NLP, LangEvals). You bring your own ClickHouse + S3 in your VPC.

For Cloud Enterprise or Hybrid, [contact the LangWatch team](https://langwatch.ai/get-a-demo).