Concepts

A short tour of every primitive the gateway introduces, how they compose, and where the boundaries are with the rest of LangWatch.

Virtual key (VK)

A LangWatch-issued credential of the form lw_vk_{live|test}_<26-char-ULID> (40 chars total). Used in:

Authorization: Bearer lw_vk_live_… (OpenAI-style clients and most CLIs).
x-api-key: lw_vk_live_… (Anthropic-style clients, including Claude Code).
api-key: lw_vk_live_… (Azure-style clients).

Every gateway request must carry a VK. The VK resolves at the gateway to:

An owning scope — organisation → team → project → principal.
A set of provider credentials — the underlying OpenAI / Anthropic / Bedrock / etc. keys the VK may use. These are reused from LangWatch’s existing Model Provider settings; the gateway does not duplicate credential storage.
A fallback chain — ordered list of providers to try if the primary fails.
Model aliases — a map like {"gpt-4o": "azure/my-deployment"} that lets the VK redirect a client-friendly model name to a specific provider/deployment.
Cache policy, guardrail attachments, blocked patterns, and budgets (see below).

Show-once secret: the full key is displayed exactly once at creation and stored as a peppered HMAC-SHA256 hash (see Security → Virtual keys for the hashing-scheme rationale). Only a short prefix (lw_vk_live_01HZX9) remains visible in the UI. Rotation replaces the secret (old secret valid 24 h grace); revocation propagates to gateway caches within 60 seconds. See Virtual Keys for CRUD, rotation, and revocation details.

Budget

A spending limit attached to a scope. Scopes:

organization (all traffic in the org)
team (all traffic in a team)
project (all traffic in a project)
virtual_key (this specific VK)
principal (one user or service account)

Windows: minute, hour, day, week, month, total. on_breach:

block — 402 budget_exceeded returned once the limit is reached.
warn — request passes but the response carries X-LangWatch-Budget-Warning: <scope>:<pct>.

Hierarchical: a single request is checked against every budget that applies. Any hard-block breach blocks; warnings compose. Spend is debited after the response completes, using the token counts the provider reported, via the trace-fold reactor — replays of the same gateway_request_id collapse so gateway retries never double-bill. See Budgets.

Principal

The identity attributed to a request — the user or service account who owns the VK on this call. Matters for three things:

Attribution in traces (langwatch.principal_id).
Principal-scoped budgets (cap spend per-person).
RBAC — the principal must have permission to use the VK at all.

Personal-access VKs bind to a user principal; shared VKs bind to a service-account principal.

Model alias

A per-VK name → provider redirect. Two use cases:

Abstraction. Point gpt-4o at azure/my-deployment so code doesn’t need to know about the Azure deployment name.
Migration. Point claude-haiku at anthropic/claude-haiku-4-5-20251001 to pin a version; swap the alias to migrate every caller at once.

Aliases always win over explicit provider/model form if both are defined for the same model name on a VK.

Fallback chain

Ordered list of provider credentials (not models) to try on failure. Triggered by:

5xx from upstream.
timeout exceeding the VK’s configured fallback.timeout_ms.
429 rate_limit_exceeded from upstream.
network_error (connection reset / DNS / TLS).
circuit_breaker — gateway-internal tripped after N consecutive failures.

Does not trigger fallback:

Upstream 400 / 401 / 403 / 404 — those are client-fault errors; masking them would confuse the caller.
Any LangWatch-internal error (invalid_api_key, permission_denied, etc).

For streaming: if the primary fails before the first chunk, fallback is transparent. Once chunks have started flowing, the stream stays on the original provider — otherwise the client would see a Frankenstein stream.

Provider binding

A Provider binding (GatewayProviderCredential) is the gateway-level layer between a ModelProvider (where the actual API key lives) and a virtual key (where the caller sees the binding). The binding carries gateway-specific settings that don’t belong on the underlying credential:

Slot name — the logical tag a VK references in its fallback chain (primary, fallback-1, eu-region, canary).
Rate limits — per-binding RPM + RPD caps that apply BEFORE upstream provider limits.
Fallback priority — global ordering across all bindings in a project.

One ModelProvider can back many bindings (e.g. the same OpenAI credential under primary + canary slots, each with different rate limits). Bindings live at the project scope; the underlying ModelProvider may live at org / team / project scope (see Provider Bindings → Scope & access).

Cache rule

A Cache rule (GatewayCacheRule) is an org-scoped operator override that changes cache behaviour without touching client code. Rules are evaluated in priority order (highest-first); the first match wins. Each rule has:

Matchers (ANDed within a rule): vk_id, vk_tags, vk_prefix, principal_id, model, request_metadata.
Action: force / respect / disable, with optional ttl_seconds and action_metadata.

Precedence, from highest to lowest: per-request X-LangWatch-Cache header → matching rule → VK default config.cache.mode. Rules compile into the VK bundle at /changes-refresh time so the hot path stays at the ~700 ns target.

Caching passthrough

By default the gateway forwards Anthropic cache_control blocks, OpenAI cached-prompt markers, and Gemini implicit caches byte-identically. This is load-bearing — prompt caching saves up to 90% on input tokens, and the moment the gateway starts reformatting payloads you lose it. Overrides per-request:

X-LangWatch-Cache: respect (default) — pass through.
X-LangWatch-Cache: force — inject cache_control markers on large stable prefixes even if the client didn’t.
X-LangWatch-Cache: disable — strip every cache marker. Forces cold calls.
X-LangWatch-Cache: ttl=3600 — same as force with a TTL hint.

Per-VK default set via VK config cache.mode. Per-request header wins. Cache outcome is reported back in X-LangWatch-Cache: hit|miss|bypass|force response header and recorded in the trace with separate gen_ai.usage.cache_read.input_tokens / gen_ai.usage.cache_creation.input_tokens counters (OTel GenAI semconv).

Guardrails

LangWatch evaluators invoked inline on gateway traffic. Each VK can attach guardrails in three directions:

pre — run on the request before dispatch. Decision block fails with 403 guardrail_blocked. Decision modify rewrites the payload (e.g. PII redaction) before dispatch.
post — run on the full response after dispatch. Flag-only for streaming responses (decision recorded but response already delivered); blocking for non-streaming.
stream_chunk — run on each SSE chunk before emission. Decision block terminates the stream with an SSE error. Decision modify rewrites the chunk. Budget ≤50 ms per chunk or the guardrail is bypassed with an OTel warning.

Guardrails reuse your existing LangWatch evaluators; nothing new to author for the gateway.

Policy rules

Per-VK regex allow/deny lists for:

Tool calls — matched against tools[].function.name (OpenAI) and tools[].name (Anthropic). Denied tools → 403 tool_not_allowed.
MCP servers — matched against declared mcp_servers[].name and .url. Denied servers → 403 tool_not_allowed.
Outbound URLs — heuristically matched against every http(s):// URL in the request body. Denied URLs → 403 url_not_allowed.

Each block emits an AuditLog row (gateway shape) + the X-LangWatch-Request-Id response header → trace correlation.

Traces (per-tenant observability)

Every request emits a LangWatch trace. The clever part: the gateway is a shared multi-tenant service, but each trace is routed to the owning tenant’s project — tenant A never sees tenant B’s traffic. Attributes always present (source of truth: services/aigateway/adapters/gatewaytracer/attrs.go):

langwatch.virtual_key_id, langwatch.project_id, langwatch.team_id, langwatch.organization_id, langwatch.principal_id.
langwatch.model, langwatch.provider, langwatch.model_source, langwatch.streaming.
langwatch.usage.input_tokens / langwatch.usage.output_tokens + OTel GenAI semconv gen_ai.usage.cache_read.input_tokens / gen_ai.usage.cache_creation.input_tokens.
langwatch.cost_usd, langwatch.duration_ms, langwatch.status, langwatch.budget.decision.

When a bundle-baked cache rule matches: langwatch.cache.rule_id / .rule_priority / .mode_applied.

Previously-documented per-feature attrs (langwatch.fallback.attempt / .reason, langwatch.policy.blocked, langwatch.budget.breached_scope, langwatch.guardrail.post_flag / .verdict) are not emitted in v1. For per-feature signals today, use the Prometheus counters + correlate via X-LangWatch-Request-Id. See Observability → Per-feature attributes for the v1 ↔ v1.1 breakdown.

Request id

Every response carries X-LangWatch-Request-Id: grq_<ULID>. Use it to:

Deep-link into the LangWatch trace from logs.
File a support ticket — the first thing we ask for.
Debug CLI integrations — the Codex / Claude Code logs echo these.

The id also flows into the CH ledger as the idempotency key for debit replays, so a single request is billed exactly once even across gateway restarts.

What’s intentionally not here

SDK migration of existing litellm callsites — the LangWatch playground and evaluators continue to use the litellm path. The gateway serves new traffic; existing internal call sites don’t change. This is deliberate scope control for v1.
Semantic caching across unrelated requests — Bifrost ships a semantic-cache plugin; we’re not enabling it by default because it changes latency characteristics in ways customers should opt into.
Per-region provider routing for data residency — tracked as an open question in the contract; not in v1.

Once these concepts are in your head the rest of the docs — Virtual Keys, Budgets, Providers, CLI Integrations — are detail views on the same primitives.

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Virtual key (VK)

Budget

Principal

Model alias

Fallback chain

Provider binding

Cache rule

Caching passthrough

Guardrails

Policy rules

Traces (per-tenant observability)

Request id

What’s intentionally not here

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Documentation Index

​Virtual key (VK)

​Budget

​Principal

​Model alias

​Fallback chain

​Provider binding

​Cache rule

​Caching passthrough

​Guardrails

​Policy rules

​Traces (per-tenant observability)

​Request id

​What’s intentionally not here

Virtual key (VK)

Budget

Principal

Model alias

Fallback chain

Provider binding

Cache rule

Caching passthrough

Guardrails

Policy rules

Traces (per-tenant observability)

Request id

What’s intentionally not here