Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

A short tour of every primitive the gateway introduces, how they compose, and where the boundaries are with the rest of LangWatch.

Virtual key (VK)

A LangWatch-issued credential of the form lw_vk_{live|test}_<26-char-ULID> (40 chars total). Used in:
  • Authorization: Bearer lw_vk_live_… (OpenAI-style clients and most CLIs).
  • x-api-key: lw_vk_live_… (Anthropic-style clients, including Claude Code).
  • api-key: lw_vk_live_… (Azure-style clients).
Every gateway request must carry a VK. The VK resolves at the gateway to:
  • An owning scope — organisation → team → project → principal.
  • A set of provider credentials — the underlying OpenAI / Anthropic / Bedrock / etc. keys the VK may use. These are reused from LangWatch’s existing Model Provider settings; the gateway does not duplicate credential storage.
  • A fallback chain — ordered list of providers to try if the primary fails.
  • Model aliases — a map like {"gpt-4o": "azure/my-deployment"} that lets the VK redirect a client-friendly model name to a specific provider/deployment.
  • Cache policy, guardrail attachments, blocked patterns, and budgets (see below).
Show-once secret: the full key is displayed exactly once at creation and stored as a peppered HMAC-SHA256 hash (see Security → Virtual keys for the hashing-scheme rationale). Only a short prefix (lw_vk_live_01HZX9) remains visible in the UI. Rotation replaces the secret (old secret valid 24 h grace); revocation propagates to gateway caches within 60 seconds. See Virtual Keys for CRUD, rotation, and revocation details.

Budget

A spending limit attached to a scope. Scopes:
  • organization (all traffic in the org)
  • team (all traffic in a team)
  • project (all traffic in a project)
  • virtual_key (this specific VK)
  • principal (one user or service account)
Windows: minute, hour, day, week, month, total. on_breach:
  • block — 402 budget_exceeded returned once the limit is reached.
  • warn — request passes but the response carries X-LangWatch-Budget-Warning: <scope>:<pct>.
Hierarchical: a single request is checked against every budget that applies. Any hard-block breach blocks; warnings compose. Spend is debited after the response completes, using the token counts the provider reported, via the trace-fold reactor — replays of the same gateway_request_id collapse so gateway retries never double-bill. See Budgets.

Principal

The identity attributed to a request — the user or service account who owns the VK on this call. Matters for three things:
  • Attribution in traces (langwatch.principal_id).
  • Principal-scoped budgets (cap spend per-person).
  • RBAC — the principal must have permission to use the VK at all.
Personal-access VKs bind to a user principal; shared VKs bind to a service-account principal.

Model alias

A per-VK name → provider redirect. Two use cases:
  • Abstraction. Point gpt-4o at azure/my-deployment so code doesn’t need to know about the Azure deployment name.
  • Migration. Point claude-haiku at anthropic/claude-haiku-4-5-20251001 to pin a version; swap the alias to migrate every caller at once.
Aliases always win over explicit provider/model form if both are defined for the same model name on a VK.

Fallback chain

Ordered list of provider credentials (not models) to try on failure. Triggered by:
  • 5xx from upstream.
  • timeout exceeding the VK’s configured fallback.timeout_ms.
  • 429 rate_limit_exceeded from upstream.
  • network_error (connection reset / DNS / TLS).
  • circuit_breaker — gateway-internal tripped after N consecutive failures.
Does not trigger fallback:
  • Upstream 400 / 401 / 403 / 404 — those are client-fault errors; masking them would confuse the caller.
  • Any LangWatch-internal error (invalid_api_key, permission_denied, etc).
For streaming: if the primary fails before the first chunk, fallback is transparent. Once chunks have started flowing, the stream stays on the original provider — otherwise the client would see a Frankenstein stream.

Provider binding

A Provider binding (GatewayProviderCredential) is the gateway-level layer between a ModelProvider (where the actual API key lives) and a virtual key (where the caller sees the binding). The binding carries gateway-specific settings that don’t belong on the underlying credential:
  • Slot name — the logical tag a VK references in its fallback chain (primary, fallback-1, eu-region, canary).
  • Rate limits — per-binding RPM + RPD caps that apply BEFORE upstream provider limits.
  • Fallback priority — global ordering across all bindings in a project.
One ModelProvider can back many bindings (e.g. the same OpenAI credential under primary + canary slots, each with different rate limits). Bindings live at the project scope; the underlying ModelProvider may live at org / team / project scope (see Provider Bindings → Scope & access).

Cache rule

A Cache rule (GatewayCacheRule) is an org-scoped operator override that changes cache behaviour without touching client code. Rules are evaluated in priority order (highest-first); the first match wins. Each rule has:
  • Matchers (ANDed within a rule): vk_id, vk_tags, vk_prefix, principal_id, model, request_metadata.
  • Action: force / respect / disable, with optional ttl_seconds and action_metadata.
Precedence, from highest to lowest: per-request X-LangWatch-Cache header → matching rule → VK default config.cache.mode. Rules compile into the VK bundle at /changes-refresh time so the hot path stays at the ~700 ns target.

Caching passthrough

By default the gateway forwards Anthropic cache_control blocks, OpenAI cached-prompt markers, and Gemini implicit caches byte-identically. This is load-bearing — prompt caching saves up to 90% on input tokens, and the moment the gateway starts reformatting payloads you lose it. Overrides per-request:
  • X-LangWatch-Cache: respect (default) — pass through.
  • X-LangWatch-Cache: force — inject cache_control markers on large stable prefixes even if the client didn’t.
  • X-LangWatch-Cache: disable — strip every cache marker. Forces cold calls.
  • X-LangWatch-Cache: ttl=3600 — same as force with a TTL hint.
Per-VK default set via VK config cache.mode. Per-request header wins. Cache outcome is reported back in X-LangWatch-Cache: hit|miss|bypass|force response header and recorded in the trace with separate gen_ai.usage.cache_read.input_tokens / gen_ai.usage.cache_creation.input_tokens counters (OTel GenAI semconv).

Guardrails

LangWatch evaluators invoked inline on gateway traffic. Each VK can attach guardrails in three directions:
  • pre — run on the request before dispatch. Decision block fails with 403 guardrail_blocked. Decision modify rewrites the payload (e.g. PII redaction) before dispatch.
  • post — run on the full response after dispatch. Flag-only for streaming responses (decision recorded but response already delivered); blocking for non-streaming.
  • stream_chunk — run on each SSE chunk before emission. Decision block terminates the stream with an SSE error. Decision modify rewrites the chunk. Budget ≤50 ms per chunk or the guardrail is bypassed with an OTel warning.
Guardrails reuse your existing LangWatch evaluators; nothing new to author for the gateway.

Policy rules

Per-VK regex allow/deny lists for:
  • Tool calls — matched against tools[].function.name (OpenAI) and tools[].name (Anthropic). Denied tools → 403 tool_not_allowed.
  • MCP servers — matched against declared mcp_servers[].name and .url. Denied servers → 403 tool_not_allowed.
  • Outbound URLs — heuristically matched against every http(s):// URL in the request body. Denied URLs → 403 url_not_allowed.
Each block emits an AuditLog row (gateway shape) + the X-LangWatch-Request-Id response header → trace correlation.

Traces (per-tenant observability)

Every request emits a LangWatch trace. The clever part: the gateway is a shared multi-tenant service, but each trace is routed to the owning tenant’s project — tenant A never sees tenant B’s traffic. Attributes always present (source of truth: services/aigateway/adapters/gatewaytracer/attrs.go):
  • langwatch.virtual_key_id, langwatch.project_id, langwatch.team_id, langwatch.organization_id, langwatch.principal_id.
  • langwatch.model, langwatch.provider, langwatch.model_source, langwatch.streaming.
  • langwatch.usage.input_tokens / langwatch.usage.output_tokens + OTel GenAI semconv gen_ai.usage.cache_read.input_tokens / gen_ai.usage.cache_creation.input_tokens.
  • langwatch.cost_usd, langwatch.duration_ms, langwatch.status, langwatch.budget.decision.
When a bundle-baked cache rule matches: langwatch.cache.rule_id / .rule_priority / .mode_applied.
Previously-documented per-feature attrs (langwatch.fallback.attempt / .reason, langwatch.policy.blocked, langwatch.budget.breached_scope, langwatch.guardrail.post_flag / .verdict) are not emitted in v1. For per-feature signals today, use the Prometheus counters + correlate via X-LangWatch-Request-Id. See Observability → Per-feature attributes for the v1 ↔ v1.1 breakdown.

Request id

Every response carries X-LangWatch-Request-Id: grq_<ULID>. Use it to:
  • Deep-link into the LangWatch trace from logs.
  • File a support ticket — the first thing we ask for.
  • Debug CLI integrations — the Codex / Claude Code logs echo these.
The id also flows into the CH ledger as the idempotency key for debit replays, so a single request is billed exactly once even across gateway restarts.

What’s intentionally not here

  • SDK migration of existing litellm callsites — the LangWatch playground and evaluators continue to use the litellm path. The gateway serves new traffic; existing internal call sites don’t change. This is deliberate scope control for v1.
  • Semantic caching across unrelated requests — Bifrost ships a semantic-cache plugin; we’re not enabling it by default because it changes latency characteristics in ways customers should opt into.
  • Per-region provider routing for data residency — tracked as an open question in the contract; not in v1.
Once these concepts are in your head the rest of the docs — Virtual Keys, Budgets, Providers, CLI Integrations — are detail views on the same primitives.