Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

The gateway sits on the hot path between your applications and upstream LLM providers. Everything below is what the gateway does NOT do, what it DOES do, and how each guarantee is enforced.

Threat model

In scope:
  • A compromised customer application attempting to exfiltrate other tenants’ data.
  • A compromised virtual key leaking upstream.
  • An insider at LangWatch attempting to read VK secrets or upstream provider credentials.
  • A network-level attacker between gateway and control plane.
Out of scope:
  • Compromise of the underlying upstream provider (Anthropic, OpenAI, etc.). Your trust in them is independent of LangWatch.
  • Compromise of the customer’s OS / build pipeline supplying the VK to their app.

Secrets at rest

Virtual keys

VK secrets are hashed before persistence and the plaintext is shown exactly once at creation. The hash scheme is peppered HMAC-SHA256 rather than argon2id — chosen deliberately:
  • Each VK has 130 bits of Crockford-ULID entropy. Stretching a 130-bit random with argon2 costs 50–100 ms per validation for essentially no attack surface reduction.
  • HMAC with a server-side pepper (LW_VIRTUAL_KEY_PEPPER, rotated via dual-pepper overlap window) makes offline cracking infeasible without the pepper.
  • HMAC is constant-time and enables an O(1) lookup-by-hash — critical for the hot-path /resolve-key endpoint.
See Virtual Keys → hashing.

Upstream provider credentials

Provider credentials (OpenAI keys, Azure deployments, Bedrock IAM, Vertex service accounts) are stored encrypted at rest in the control plane. The gateway never sees the plaintext — it receives a redacted bundle and the actual upstream call is made from a short-lived subprocess with the decrypted value in memory only, zeroed after the call.
  • Encryption key: per-organization KMS key (AWS KMS in LangWatch Cloud; customer-supplied KMS in BYOK deployments).
  • Rotation: rotate the KMS key, re-encrypt all credentials in place. Zero-downtime — decrypt-then-re-encrypt happens in the background, old-key references resolve for 24 h during migration.

Gateway-to-control-plane auth (HMAC)

Every internal call (/api/internal/gateway/*) is signed with HMAC-SHA256. The signature covers METHOD\nPATH\nTIMESTAMP\nhex(sha256(body)), with a ±300s replay window. Rotation uses a dual-secret overlap (LW_GATEWAY_INTERNAL_SECRET + _PREVIOUS) so rolling restart doesn’t reject in-flight calls. Signature verification happens before the timestamp check to avoid a secret-length timing oracle.

JWTs on the hot path

Once a VK is resolved, the gateway caches a short-lived JWT (15 min TTL) containing the minimum claims needed to authorize: {vk_id, project_id, team_id, org_id, principal_id, revision, iat, exp, iss, aud}. The full VK config is fetched separately with ETag revalidation so cache entries are invalidated atomically on any edit. Never persisted to disk. Every replica’s L1 cache is in memory; the optional L2 Redis is also in-memory.

Tenant isolation

Traces

Every span carries langwatch.{vk_id, project_id, team_id, organization_id}. The gateway ships all spans to a single OTel endpoint (GATEWAY_OTEL_DEFAULT_ENDPOINT); LangWatch ingest reads langwatch.project_id off each span and files the trace under the owning project. Tenant isolation is enforced at the ingest layer: a span’s project_id must match a project the gateway is authorized to write to (validated via the VK’s provider binding at bundle-resolve time), and ClickHouse storage partitions on TenantId with middleware-enforced predicates on every query. Implication: a bug or data leak in one project’s span payload cannot land in another project’s LangWatch UI. Cross-project queries are impossible from the gateway side.

Upstream providers

Each request’s upstream call uses the VK’s bound provider_credential_id. Credentials are scoped to the VK’s owning project, so a VK in project A cannot use project B’s OpenAI key even if both are in the same organization. This is enforced at two layers:
  1. Control-plane RBAC — POST /api/gateway/v1/virtual-keys validates the caller has gatewayProviders:view on each provider_credential_id referenced.
  2. Gateway data-plane — the bundle returned by /config/:vk_id contains only the providers bound to that VK. The gateway has no way to reach a different project’s provider even if a request is maliciously crafted.

Budgets & debits

Debits carry the VK’s organization_id as a filter predicate on every Postgres write. A misrouted debit cannot land on another org’s ledger — the foreign-key constraint would reject it.

Privileged actions are audited

Every write through the REST API or the UI emits a row in the platform-wide AuditLog (gateway shape):
  • userId — the resolved actor user (session, PAT, or API token mapped to a user).
  • action — dotted-lowercase string code: gateway.virtual_key.created, gateway.virtual_key.rotated, gateway.budget.deleted, gateway.provider_binding.updated, etc. (See Audit log → What’s logged for the full mapping.)
  • targetKind / targetId — resource kind + id the action affected.
  • before / after — JSON diff on update actions.
Audit logs are visible to organization admins under /settings/audit-log with a Source = “Gateway” badge for gateway rows. Retention: 7 years default, extendable on request. Rotations and revocations are considered security-sensitive and generate an organization-wide notification (configurable).

RBAC on gateway resources

Six resources, each with standard CRUD + specialized actions:
ResourceReadCreate / AttachUpdateDelete / Rotate / Detach
virtualKeysvirtualKeys:view:create:update:delete, :rotate
gatewayBudgets:view:create:update:delete
gatewayProviders:view:update
gatewayGuardrails:view:attach:detach
gatewayLogs:view
gatewayUsage:view
Each resource also supports a :manage permission that acts as a superset of all actions for that resource — useful for custom roles that should get full control over one surface without opening every individual verb. Permissions bind to principals via the existing LangWatch role system — same roles/groups used elsewhere in the platform. A “VK owner” role (for engineers who create their own dev VKs but can’t touch team VKs) is the common pattern; it grants virtualKeys:{view,create,update,rotate,delete} on their personally-owned VKs via the principal_user_id scope. See RBAC for the full resource matrix.

What the gateway can’t see

The gateway sits in the request path but is a passthrough for payload bytes:
  • It does NOT log request or response bodies by default. Body bytes are forwarded into bifrost’s upstream call and returned to the client; neither the gateway process nor its structured logs retain them.
  • OTel spans record metadata (model, token counts, latencies, cost, fallback attempts) but NOT message content. Message content is written into LangWatch’s trace ingestion at the SDK layer, which is a separate pipeline the customer opted into. Some gateway spans mirror the first 200 chars of user messages when customers enable the LangWatch “Message preview” setting — that toggle is off by default and project-scoped.
  • It does NOT decrypt cached X-LangWatch-Cache: force headers. Caching is handled entirely upstream at the provider (Anthropic cache_control); the gateway is transparent to which bytes are actually cached.

Data-at-rest in the LangWatch control plane

Postgres:
  • All VK-adjacent tables live in the primary app database.
  • TLS-encrypted connections between gateway ↔ control plane (no cleartext network traffic).
  • Field-level encryption on provider credential secrets using per-org KMS keys (AWS KMS default; customer-managed KMS available for BYOK).
ClickHouse:
  • Trace/span data is per-organization, partitioned on TenantId. Queries from the UI always include a TenantId predicate enforced at middleware. Cross-tenant reads are prevented by the ClickHouse query layer, not just by convention.

Self-hosted deployments

In a self-hosted deployment, the gateway and control plane both run in your infrastructure. Data never leaves your cluster except on outbound calls to LLM providers. You own every secret, every KMS key, every audit log, and every trace. The gateway Helm chart ships a deny-by-default NetworkPolicy (opt-in) that locks lateral traffic — only ingress-nginx and Prometheus can reach the pod, and egress is limited to DNS, the control plane, Redis (if configured), OTLP (if configured), and provider upstream IPs. The operator debug surface (pprof) is loopback-bound by default and unreachable over any Service; deployments that need direct access can bind non-loopback with a required bearer token — the gateway refuses to start in the bind-public-without-token configuration. See Self-Hosting → Helm → NetworkPolicy and Self-Hosting → Helm → Admin listener. See Self-Hosting → Helm for the deployment topology.

Reporting a security issue

Email security@langwatch.ai with reproduction steps. Do not file GitHub issues for security reports. The security team replies within 1 business day; known-exploitable issues trigger a 24-hour SLA for acknowledgement. Public acknowledgements are listed in the security.txt endpoint.

See also