Virtual key (VK)
A LangWatch-issued credential of the formvk-lw-{live|test}_<26-char-ULID> (40 chars total). Used in:
Authorization: Bearer vk-lw-…(OpenAI-style clients and most CLIs).x-api-key: vk-lw-…(Anthropic-style clients, including Claude Code).api-key: vk-lw-…(Azure-style clients).
- An owning organisation (
organizationId, always present) plus a set of scope rows (VirtualKeyScope[]) at any combination ofORGANIZATION,TEAM,PROJECT. The scope set drives the VK’s eligible provider set and budget cascade; see Virtual key scoping. - An optional principal (
principalUserId) — present on personal-access VKs minted via the CLI device-flow, orthogonal to the scope set. - A set of provider credentials: the underlying OpenAI, Anthropic, Bedrock, etc. keys the VK may use. Resolved at bundle-time by walking the VK’s scopes upward and unioning every visible ModelProvider. These are reused from LangWatch’s existing Model Provider settings; the gateway does not duplicate credential storage.
- A routing policy (ordered list of ModelProvider ids the VK may dispatch through, with fallback semantics). The VK may pin a specific policy, or fall back to the org’s default policy ordered by
fallbackPriorityGlobalthencreatedAt. - Model aliases: a map like
{"gpt-4o": "azure/my-deployment"}that lets the VK redirect a client-friendly model name to a specific provider/deployment. - Cache policy, guardrail attachments, blocked patterns, and budgets (see below).
vk-lw-01HZX9) remains visible in the UI. Rotation replaces the secret (old secret valid 24 h grace); revocation propagates to gateway caches within 60 seconds.
See Virtual Keys for CRUD, rotation, and revocation details.
Budget
A spending limit attached to a scope. Scopes:organization(all traffic in the org)team(all traffic in a team)project(all traffic in a project)virtual_key(this specific VK)principal(one user or service account)
minute, hour, day, week, month, total.
on_breach:
block, 402budget_exceededreturned once the limit is reached.warn, request passes but the response carriesX-LangWatch-Budget-Warning: <scope>:<pct>.
gateway_request_id collapse so gateway retries never double-bill.
See Budgets.
Principal
The identity attributed to a request, the user or service account who owns the VK on this call. Matters for three things:- Attribution in traces (
langwatch.principal_id). - Principal-scoped budgets (cap spend per-person).
- RBAC: the principal must have permission to use the VK at all.
Model alias
A per-VK name → provider redirect. Two use cases:- Abstraction. Point
gpt-4oatazure/my-deploymentso code doesn’t need to know about the Azure deployment name. - Migration. Point
claude-haikuatanthropic/claude-haiku-4-5-20251001to pin a version; swap the alias to migrate every caller at once.
provider/model form if both are defined for the same model name on a VK.
Fallback chain
Ordered list of ModelProvider ids (not models) the VK may dispatch through. Carried on the routing policy bound to the VK (or, when the VK pins no policy, on the org’s default policy). Triggered by:5xxfrom upstream.timeoutexceeding the VK’s configuredfallback.timeout_ms.429 rate_limit_exceededfrom upstream.network_error(connection reset, DNS, TLS).circuit_breaker, gateway-internal tripped after N consecutive failures.
- Upstream
400 / 401 / 403 / 404, those are client-fault errors; masking them would confuse the caller. - Any LangWatch-internal error (
invalid_api_key,permission_denied, etc).
ModelProvider (gateway-side)
The same ModelProvider record you configure under Settings → Model Providers is the only credential entity the gateway knows about. There is no separate gateway-binding wrapper; gateway-specific fields (RPM, TPM, RPD, fallback priority, providerConfig overrides) live on the ModelProvider itself, on the Advanced (Gateway) tab of the editor. A ModelProvider lives atORGANIZATION, TEAM, or PROJECT scope. Multiple deployments of the same provider family (e.g. OpenAI US vs OpenAI EU) are sibling ModelProvider rows with distinct names; VKs and routing policies reference them by id.
See Gateway provider settings for the field list and vk-scope-inheritance.feature for how the eligible-provider set is derived from a VK’s scope rows.
Cache rule
A Cache rule (GatewayCacheRule) is an org-scoped operator override that changes cache behaviour without touching client code. Rules are evaluated in priority order (highest-first); the first match wins. Each rule has:
- Matchers (ANDed within a rule):
vk_id,vk_tags,vk_prefix,principal_id,model,request_metadata. - Action:
force,respect,disable, with optionalttl_secondsandaction_metadata.
X-LangWatch-Cache header → matching rule → VK default config.cache.mode. Rules compile into the VK bundle at /changes-refresh time so the hot path stays at the ~700 ns target.
Caching passthrough
By default the gateway forwards Anthropiccache_control blocks, OpenAI cached-prompt markers, and Gemini implicit caches byte-identically. This is load-bearing, prompt caching saves up to 90% on input tokens, and the moment the gateway starts reformatting payloads you lose it.
Overrides per-request:
X-LangWatch-Cache: respect(default), pass through.X-LangWatch-Cache: force, injectcache_controlmarkers on large stable prefixes even if the client didn’t.X-LangWatch-Cache: disable, strip every cache marker. Forces cold calls.X-LangWatch-Cache: ttl=3600, same asforcewith a TTL hint.
cache.mode. Per-request header wins.
Cache outcome is reported back in X-LangWatch-Cache: hit|miss|bypass|force response header and recorded in the trace with separate gen_ai.usage.cache_read.input_tokens, gen_ai.usage.cache_creation.input_tokens counters (OTel GenAI semconv).
Guardrails
LangWatch evaluators invoked inline on gateway traffic. Each VK can attach guardrails in three directions:pre, run on the request before dispatch. Decisionblockfails with 403guardrail_blocked. Decisionmodifyrewrites the payload (e.g. PII redaction) before dispatch.post, run on the full response after dispatch. Flag-only for streaming responses (decision recorded but response already delivered); blocking for non-streaming.stream_chunk, run on each SSE chunk before emission. Decisionblockterminates the stream with an SSE error. Decisionmodifyrewrites the chunk. Budget ≤50 ms per chunk or the guardrail is bypassed with an OTel warning.
Policy rules
Per-VK regex allow/deny lists for:- Tool calls: matched against
tools[].function.name(OpenAI) andtools[].name(Anthropic). Denied tools → 403tool_not_allowed. - MCP servers: matched against declared
mcp_servers[].nameand.url. Denied servers → 403tool_not_allowed. - Outbound URLs: heuristically matched against every
http(s)://URL in the request body. Denied URLs → 403url_not_allowed.
AuditLog row (gateway shape) + the X-LangWatch-Request-Id response header → trace correlation.
Traces (per-tenant observability)
Every request emits a LangWatch trace. The clever part: the gateway is a shared multi-tenant service, but each trace is routed to the owning tenant’s project: tenant A never sees tenant B’s traffic. Attributes always present (source of truth:services/aigateway/adapters/gatewaytracer/attrs.go):
langwatch.virtual_key_id,langwatch.project_id,langwatch.team_id,langwatch.organization_id,langwatch.principal_id.langwatch.model,langwatch.provider,langwatch.model_source,langwatch.streaming.langwatch.usage.input_tokens,langwatch.usage.output_tokens+ OTel GenAI semconvgen_ai.usage.cache_read.input_tokens,gen_ai.usage.cache_creation.input_tokens.langwatch.cost_usd,langwatch.duration_ms,langwatch.status,langwatch.budget.decision.
langwatch.cache.rule_id, .rule_priority, .mode_applied.
Previously-documented per-feature attrs (
langwatch.fallback.attempt, .reason, langwatch.policy.blocked, langwatch.budget.breached_scope, langwatch.guardrail.post_flag, .verdict) are not emitted in v1. For per-feature signals today, use the Prometheus counters + correlate via X-LangWatch-Request-Id. See Observability → Per-feature attributes for the v1 ↔ v1.1 breakdown.Request id
Every response carriesX-LangWatch-Request-Id: grq_<ULID>. Use it to:
- Deep-link into the LangWatch trace from logs.
- File a support ticket, the first thing we ask for.
- Debug CLI integrations, the Codex, Claude Code logs echo these.
What’s intentionally not here
- SDK migration of existing litellm callsites: the LangWatch playground and evaluators continue to use the litellm path. The gateway serves new traffic; existing internal call sites don’t change. This is deliberate scope control for v1.
- Semantic caching across unrelated requests: Bifrost ships a semantic-cache plugin; we’re not enabling it by default because it changes latency characteristics in ways customers should opt into.
- Per-region provider routing for data residency: tracked as an open question in the contract; not in v1.