Ingestion sources

Pairs with: AI Gateway → Observability. The gateway is itself an ingestion source (its spans land in the same trace store); IngestionSources extend that substrate to platforms whose API key path you don’t own (Workato, Copilot Studio, OpenAI/Anthropic compliance APIs). Also pairs with Audit log: S3 audit log feeds (Workato, Copilot Studio, Anthropic Compliance) become OTel-shaped events through this same pipeline, so a single audit log query at /settings/audit-log covers in-platform mutations and third-party-tool activity uniformly.

otel_generic + thirty_days retention are Apache 2.0 floor; multi-source + extended retention require Enterprise. Self-hosted Apache 2.0 deployments can wire one OTel receiver at the default 30-day retention; the additional source types on this page (Workato, S3, Copilot Studio, OpenAI, Anthropic compliance APIs) and the one_year, seven_years retention classes ship under the Enterprise license. Non-enterprise organizations see an upgrade card on /settings/governance/ingestion-sources. See Open-core licensing for the full split.

LangWatch governance ingest accepts events from upstream agent and audit-log platforms and lands them in the same recorded_spans + log_records tables that power the rest of LangWatch, distinguished by langwatch.origin.kind = "ingestion_source" origin metadata stamped at the receiver edge. Same trace viewer, same compliance + RBAC + retention machinery, surfaced in the governance dashboard, the per-source detail page, and the langwatch ingest tail CLI.

Governance ingestion is for third-party AI platforms your org runs. If you want to send your own application’s LLM traces to LangWatch, use /api/otel/v1/traces with your project API key, that’s the right home for traces from systems you own. See Choosing the right OTel endpoint for the full comparison.

This section documents each supported sourceType honestly: what’s wired up today, what’s still envelope-only, and what’s a configuration contract awaiting an adapter.

State of each receiver: at a glance

Source type	Delivery	OTLP shape	Storage	State
`otel_generic`	Push (HTTP/OTLP)	Spans	`recorded_spans`	Production-ready
`claude_cowork`	Push (HTTP/OTLP)	Spans	`recorded_spans`	Receiver works, Cowork-specific attribute mapping pending
`workato`	Webhook → OTLP logs	Logs	`log_records`	Receiver works, deeper audit-shape parser pending
`s3_custom`	S3 replay + callback webhook	Logs	`log_records`	Callback-mode works, S3 puller + DSL parser pending
`copilot_studio`	Pull (Purview Audit API)	Logs	`log_records`	Setup-contract-only: Azure AD app config persisted, puller worker pending
`openai_compliance`	Pull (Enterprise Compliance JSONL)	Logs	`log_records`	Setup-contract-only: S3 + role config persisted, puller worker pending
`claude_compliance`	Pull (Anthropic Compliance API)	Logs	`log_records`	Setup-contract-only: workspace API key persisted, puller worker pending

Production-ready = receiver accepts traffic, lands in unified store, web detail page + CLI tail render with full payload + cost, token deltas, drill-down opens the trace viewer (span-shape) or log-detail pane (log-shape). Receiver works = endpoint accepts and ack’s traffic, source flips to active, the unified store has the event with origin metadata stamped, but per-platform deeper attribute extraction beyond the default OTLP/JSON shape is pending. Drill-down still works, but column displays may show generic labels until the platform-specific adapter ships. Setup-contract-only = the source can be created via the admin composer and persists the per-platform fields the eventual puller worker will use, but no traffic flows yet. Creating a source mints the credential + persists the config, but langwatch ingest health will show zeros indefinitely until the worker ships.

Why two OTLP shapes (spans vs logs)

Different upstreams emit different event shapes:

Span-shaped sources (otel_generic, claude_cowork) emit native OTLP spans with parent-child relationships, durations, and span kind. They benefit from drill-down in the trace viewer (showing the multi-step agent activity tree).
Flat-event sources (workato, s3_custom, copilot_studio, openai_compliance, claude_compliance) emit one event = one row, no span tree. Forcing them into the span shape requires synthetic traceId, spanId, duration that carry no information. They land naturally as OTLP log_records and drill into the log detail pane.

One internal pipeline either way: both shapes pass through the same hardened OTLP parser (parseOtlpBody.ts) and the same trace pipeline downstream. Both are queryable from the same governance dashboard, both pull into the OCSF v1.1 read API, both honour the per-origin retention class.

How a source becomes “active”

Independent of source type, the lifecycle is:

Admin opens /settings/governance/ingestion-sources, picks a source type, fills the per-type config form (including the retention class), clicks Create.
Backend mints a lw_is_<base64url> ingest secret, hashes it, persists the source with status awaiting_first_event. The secret is shown once in a one-time-reveal modal, it’s never returned by the list/get endpoints again.
Admin pastes the secret + per-source endpoint into the upstream platform (OTLP exporter URL, webhook destination, Azure app secret, S3 bucket policy, etc.).
First event arrives at the receiver → receiver lazy-ensures the hidden Governance Project for the org (if not already), stamps langwatch.origin.* + langwatch.governance.retention_class attributes, hands off to the existing trace pipeline. Source status flips to active + lastEventAt updates.
Web detail page polls every 10s, CLI tail polls every 3s in --follow mode, both see the flip + the new event simultaneously.

For source types where the puller worker is not yet wired (the three “setup-contract-only” rows above), step 4 won’t fire; the source stays at awaiting_first_event until the adapter ships.

Auth contract: same shape across every receiver

All push-mode + webhook receivers use Authorization: Bearer lw_is_<secret> (the same shape the gateway uses for its vk-lw- virtual keys, just a different prefix). Mismatched :sourceId path vs. resolved-secret source ID returns 401 unauthorized without leaking which one exists. The 24-hour rotation grace window is documented in the IngestionSource lifecycle architecture.

Verification: same loop for every active source

Once a source is active, verify with the governance CLI debug helpers:

# What does the org see?
langwatch ingest list

# How healthy is one source?
langwatch ingest health <sourceId>

# What events landed in real time?
langwatch ingest tail <sourceId> --follow

The CLI hits the same backend the web detail page hits, byte-for-byte (--json mode is contract-stable against api.governance.eventsForSource and api.ingestionSources.healthMetrics).

Origin metadata: what every event carries

Every governance-ingested span, log_record is stamped at the receiver edge with reserved attribute namespaces (rejected if found in user-supplied OTLP):

Attribute	Purpose
`langwatch.origin.kind`	Always `"ingestion_source"` for governance traffic, discriminator
`langwatch.ingestion_source.id`	Source identity (matches the `lw_is_*` Bearer’s resolved IngestionSource)
`langwatch.ingestion_source.organization_id`	Org tenancy for cross-source roll-ups
`langwatch.ingestion_source.source_type`	Source type (`otel_generic`, `workato`, etc.) for filtering
`langwatch.governance.retention_class`	`thirty_days`, `one_year`, `seven_years`, drives ClickHouse TTL

These appear as read-only system metadata in the trace viewer, log detail pane; users cannot supply or edit them.

Adapter roadmap

The order in which deeper per-platform adapters land depends on customer pull. The current priority order:

Cowork attribute extraction: Cowork pushes OTLP today; receiver accepts but spans aren’t mapped to Cowork’s tool_use taxonomy yet. Smallest gap to close.
Workato audit-shape parser: Workato has a stable JSON envelope for job-completed events; the default mapper retains the raw envelope; deeper Actor, Action, Target extraction is the follow-up.
Microsoft Copilot Studio (Purview Audit) puller: needs an Azure AD app + cron worker; the config is persisted but the worker isn’t.
OpenAI, Anthropic Enterprise Compliance pullers: S3 + Anthropic compliance API respectively; config persisted, pullers pending.
s3_custom DSL parser, generic-shape parser for homegrown audit logs; the customer-facing DSL design is in flight.

If you need an unwired adapter, file an issue tagged governance/ingestion/adapter-roadmap and we’ll sequence accordingly.

Cross-references

Trace vs governance ingestion: picking the right OTel URL
Compliance architecture: how the unified substrate underwrites SOC 2, ISO 27001, EU AI Act, GDPR, HIPAA-most-uses
Per-origin retention: thirty_days, one_year, seven_years classes
OCSF, SIEM export: pulling governance events into Splunk, Datadog Security, etc.
Governance CLI debug: langwatch ingest list/health/tail commands

Get Started

Personal Portal

Workspaces & Access

Dashboards

Privacy

Sources

Detection

Compliance & Architecture

Operations

Programmatic surfaces

State of each receiver: at a glance

Why two OTLP shapes (spans vs logs)

How a source becomes “active”

Auth contract: same shape across every receiver

Verification: same loop for every active source

Origin metadata: what every event carries

Adapter roadmap

Cross-references

​State of each receiver: at a glance

​Why two OTLP shapes (spans vs logs)

​How a source becomes “active”

​Auth contract: same shape across every receiver

​Verification: same loop for every active source

​Origin metadata: what every event carries

​Adapter roadmap

​Cross-references

State of each receiver: at a glance

Why two OTLP shapes (spans vs logs)

How a source becomes “active”

Auth contract: same shape across every receiver

Verification: same loop for every active source

Origin metadata: what every event carries

Adapter roadmap

Cross-references