A short tour of every primitive the gateway introduces, how they compose, and where the boundaries are with the rest of LangWatch.Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Virtual key (VK)
A LangWatch-issued credential of the formlw_vk_{live|test}_<26-char-ULID> (40 chars total). Used in:
Authorization: Bearer lw_vk_live_…(OpenAI-style clients and most CLIs).x-api-key: lw_vk_live_…(Anthropic-style clients, including Claude Code).api-key: lw_vk_live_…(Azure-style clients).
- An owning scope — organisation → team → project → principal.
- A set of provider credentials — the underlying OpenAI / Anthropic / Bedrock / etc. keys the VK may use. These are reused from LangWatch’s existing Model Provider settings; the gateway does not duplicate credential storage.
- A fallback chain — ordered list of providers to try if the primary fails.
- Model aliases — a map like
{"gpt-4o": "azure/my-deployment"}that lets the VK redirect a client-friendly model name to a specific provider/deployment. - Cache policy, guardrail attachments, blocked patterns, and budgets (see below).
lw_vk_live_01HZX9) remains visible in the UI. Rotation replaces the secret (old secret valid 24 h grace); revocation propagates to gateway caches within 60 seconds.
See Virtual Keys for CRUD, rotation, and revocation details.
Budget
A spending limit attached to a scope. Scopes:organization(all traffic in the org)team(all traffic in a team)project(all traffic in a project)virtual_key(this specific VK)principal(one user or service account)
minute, hour, day, week, month, total.
on_breach:
block— 402budget_exceededreturned once the limit is reached.warn— request passes but the response carriesX-LangWatch-Budget-Warning: <scope>:<pct>.
gateway_request_id collapse so gateway retries never double-bill.
See Budgets.
Principal
The identity attributed to a request — the user or service account who owns the VK on this call. Matters for three things:- Attribution in traces (
langwatch.principal_id). - Principal-scoped budgets (cap spend per-person).
- RBAC — the principal must have permission to use the VK at all.
Model alias
A per-VK name → provider redirect. Two use cases:- Abstraction. Point
gpt-4oatazure/my-deploymentso code doesn’t need to know about the Azure deployment name. - Migration. Point
claude-haikuatanthropic/claude-haiku-4-5-20251001to pin a version; swap the alias to migrate every caller at once.
provider/model form if both are defined for the same model name on a VK.
Fallback chain
Ordered list of provider credentials (not models) to try on failure. Triggered by:5xxfrom upstream.timeoutexceeding the VK’s configuredfallback.timeout_ms.429 rate_limit_exceededfrom upstream.network_error(connection reset / DNS / TLS).circuit_breaker— gateway-internal tripped after N consecutive failures.
- Upstream
400 / 401 / 403 / 404— those are client-fault errors; masking them would confuse the caller. - Any LangWatch-internal error (
invalid_api_key,permission_denied, etc).
Provider binding
A Provider binding (GatewayProviderCredential) is the gateway-level layer between a ModelProvider (where the actual API key lives) and a virtual key (where the caller sees the binding). The binding carries gateway-specific settings that don’t belong on the underlying credential:
- Slot name — the logical tag a VK references in its fallback chain (
primary,fallback-1,eu-region,canary). - Rate limits — per-binding RPM + RPD caps that apply BEFORE upstream provider limits.
- Fallback priority — global ordering across all bindings in a project.
primary + canary slots, each with different rate limits). Bindings live at the project scope; the underlying ModelProvider may live at org / team / project scope (see Provider Bindings → Scope & access).
Cache rule
A Cache rule (GatewayCacheRule) is an org-scoped operator override that changes cache behaviour without touching client code. Rules are evaluated in priority order (highest-first); the first match wins. Each rule has:
- Matchers (ANDed within a rule):
vk_id,vk_tags,vk_prefix,principal_id,model,request_metadata. - Action:
force/respect/disable, with optionalttl_secondsandaction_metadata.
X-LangWatch-Cache header → matching rule → VK default config.cache.mode. Rules compile into the VK bundle at /changes-refresh time so the hot path stays at the ~700 ns target.
Caching passthrough
By default the gateway forwards Anthropiccache_control blocks, OpenAI cached-prompt markers, and Gemini implicit caches byte-identically. This is load-bearing — prompt caching saves up to 90% on input tokens, and the moment the gateway starts reformatting payloads you lose it.
Overrides per-request:
X-LangWatch-Cache: respect(default) — pass through.X-LangWatch-Cache: force— injectcache_controlmarkers on large stable prefixes even if the client didn’t.X-LangWatch-Cache: disable— strip every cache marker. Forces cold calls.X-LangWatch-Cache: ttl=3600— same asforcewith a TTL hint.
cache.mode. Per-request header wins.
Cache outcome is reported back in X-LangWatch-Cache: hit|miss|bypass|force response header and recorded in the trace with separate gen_ai.usage.cache_read.input_tokens / gen_ai.usage.cache_creation.input_tokens counters (OTel GenAI semconv).
Guardrails
LangWatch evaluators invoked inline on gateway traffic. Each VK can attach guardrails in three directions:pre— run on the request before dispatch. Decisionblockfails with 403guardrail_blocked. Decisionmodifyrewrites the payload (e.g. PII redaction) before dispatch.post— run on the full response after dispatch. Flag-only for streaming responses (decision recorded but response already delivered); blocking for non-streaming.stream_chunk— run on each SSE chunk before emission. Decisionblockterminates the stream with an SSE error. Decisionmodifyrewrites the chunk. Budget ≤50 ms per chunk or the guardrail is bypassed with an OTel warning.
Policy rules
Per-VK regex allow/deny lists for:- Tool calls — matched against
tools[].function.name(OpenAI) andtools[].name(Anthropic). Denied tools → 403tool_not_allowed. - MCP servers — matched against declared
mcp_servers[].nameand.url. Denied servers → 403tool_not_allowed. - Outbound URLs — heuristically matched against every
http(s)://URL in the request body. Denied URLs → 403url_not_allowed.
AuditLog row (gateway shape) + the X-LangWatch-Request-Id response header → trace correlation.
Traces (per-tenant observability)
Every request emits a LangWatch trace. The clever part: the gateway is a shared multi-tenant service, but each trace is routed to the owning tenant’s project — tenant A never sees tenant B’s traffic. Attributes always present (source of truth:services/aigateway/adapters/gatewaytracer/attrs.go):
langwatch.virtual_key_id,langwatch.project_id,langwatch.team_id,langwatch.organization_id,langwatch.principal_id.langwatch.model,langwatch.provider,langwatch.model_source,langwatch.streaming.langwatch.usage.input_tokens/langwatch.usage.output_tokens+ OTel GenAI semconvgen_ai.usage.cache_read.input_tokens/gen_ai.usage.cache_creation.input_tokens.langwatch.cost_usd,langwatch.duration_ms,langwatch.status,langwatch.budget.decision.
langwatch.cache.rule_id / .rule_priority / .mode_applied.
Previously-documented per-feature attrs (
langwatch.fallback.attempt / .reason, langwatch.policy.blocked, langwatch.budget.breached_scope, langwatch.guardrail.post_flag / .verdict) are not emitted in v1. For per-feature signals today, use the Prometheus counters + correlate via X-LangWatch-Request-Id. See Observability → Per-feature attributes for the v1 ↔ v1.1 breakdown.Request id
Every response carriesX-LangWatch-Request-Id: grq_<ULID>. Use it to:
- Deep-link into the LangWatch trace from logs.
- File a support ticket — the first thing we ask for.
- Debug CLI integrations — the Codex / Claude Code logs echo these.
What’s intentionally not here
- SDK migration of existing litellm callsites — the LangWatch playground and evaluators continue to use the litellm path. The gateway serves new traffic; existing internal call sites don’t change. This is deliberate scope control for v1.
- Semantic caching across unrelated requests — Bifrost ships a semantic-cache plugin; we’re not enabling it by default because it changes latency characteristics in ways customers should opt into.
- Per-region provider routing for data residency — tracked as an open question in the contract; not in v1.