OPENAI_BASE_URL at the LangWatch AI Gateway and its OPENAI_API_KEY at a LangWatch virtual key.
Request
LangWatch-specific headers
| Header | Purpose |
|---|---|
X-LangWatch-Cache: respect|force|disable|ttl=<s> | Override the VK’s cache mode for this request. See Caching Passthrough. |
X-LangWatch-Trace-Metadata: {...} | Attach arbitrary key/value metadata to the trace (e.g. deployment id, experiment tag). |
Model resolution
Themodel field can be:
- A VK-defined alias (e.g.
gpt-4o,claude) → routed via the VK’smodel_aliasesmap. - An explicit
<provider>/<model>form (e.g.openai/gpt-5-mini,azure/my-deployment), bypasses aliases.
providers list, returns 403 model_not_allowed.
Response (non-streaming)
OpenAI-shaped. Additional LangWatch headers:usage.prompt_tokens_details.cached_tokens is populated when cache hits occur (used by the internal debit logic).
Response (streaming)
Set"stream": true in the body. The gateway proxies upstream SSE events byte-for-byte after the first chunk, preserving tool-call delta ordering, this is important for coding CLIs like Codex that parse tool-call streams incrementally.
X-LangWatch-Request-Id is emitted on the first frame. Mid-stream failures close the connection with a terminal event: error frame (see Errors).
Error handling
All errors use the OpenAI-compatible envelope documented at Errors:401 invalid_api_key, missing/malformed/unknown VK.402 budget_exceeded, hard-cap budget breach.403 guardrail_blocked | tool_not_allowed | model_not_allowed | permission_denied | virtual_key_revoked, policy.429 rate_limit_exceeded, VK, project, org rate limit.502 provider_error,504 upstream_timeout, upstream provider exhausted (possibly after fallback).
X-LangWatch-Request-Id, use it for support tickets and to jump straight to the trace in the LangWatch UI.
Example (Python)
Example (curl)
Rate limits
Per-VK rate limits are configurable (rate_limits.rpm|tpm|rpd) and enforced at the gateway before the request leaves for the provider. Breaches return 429 rate_limit_exceeded with a Retry-After header in seconds.
Upstream provider rate limits are opaque to the VK owner but can trigger fallback (if rate_limit is in the VK’s fallback.on). A Retry-After from upstream is surfaced to the client when no fallback remains.
Observability
Every request emits a LangWatch trace with attributes:langwatch.vk_id,langwatch.project_id,langwatch.team_id,langwatch.org_id,langwatch.principal_id.langwatch.model_requested(what the client sent) vslangwatch.model_resolved(provider + model after alias).gen_ai.usage.cache_read.input_tokens,gen_ai.usage.cache_creation.input_tokens(OTel GenAI semconv; cache economics).langwatch.cost_usd(computed from tokens × price).langwatch.fallback.attemptspans (one per attempt;attempt=0is primary).
attr.langwatch.model_resolved to see how many requests actually hit Azure vs OpenAI, for example.