OpenAI-compatible chat completions endpoint. Any client that speaks the OpenAI Chat Completions API — official SDKs, Codex CLI, opencode, Cursor, Aider, a thousand internal scripts — works with zero code change by pointing itsDocumentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
OPENAI_BASE_URL at the LangWatch AI Gateway and its OPENAI_API_KEY at a LangWatch virtual key.
Request
LangWatch-specific headers
| Header | Purpose |
|---|---|
X-LangWatch-Cache: respect|force|disable|ttl=<s> | Override the VK’s cache mode for this request. See Caching Passthrough. |
X-LangWatch-Trace-Metadata: {...} | Attach arbitrary key/value metadata to the trace (e.g. deployment id, experiment tag). |
Model resolution
Themodel field can be:
- A VK-defined alias (e.g.
gpt-4o,claude) → routed via the VK’smodel_aliasesmap. - An explicit
<provider>/<model>form (e.g.openai/gpt-5-mini,azure/my-deployment) — bypasses aliases.
providers list, returns 403 model_not_allowed.
Response (non-streaming)
OpenAI-shaped. Additional LangWatch headers:usage.prompt_tokens_details.cached_tokens is populated when cache hits occur (used by the internal debit logic).
Response (streaming)
Set"stream": true in the body. The gateway proxies upstream SSE events byte-for-byte after the first chunk, preserving tool-call delta ordering — this is important for coding CLIs like Codex that parse tool-call streams incrementally.
X-LangWatch-Request-Id is emitted on the first frame. Mid-stream failures close the connection with a terminal event: error frame (see Errors).
Error handling
All errors use the OpenAI-compatible envelope documented at Errors:401 invalid_api_key— missing/malformed/unknown VK.402 budget_exceeded— hard-cap budget breach.403 guardrail_blocked | tool_not_allowed | model_not_allowed | permission_denied | virtual_key_revoked— policy.429 rate_limit_exceeded— VK / project / org rate limit.502 provider_error/504 upstream_timeout— upstream provider exhausted (possibly after fallback).
X-LangWatch-Request-Id — use it for support tickets and to jump straight to the trace in the LangWatch UI.
Example (Python)
Example (curl)
Rate limits
Per-VK rate limits are configurable (rate_limits.rpm|tpm|rpd) and enforced at the gateway before the request leaves for the provider. Breaches return 429 rate_limit_exceeded with a Retry-After header in seconds.
Upstream provider rate limits are opaque to the VK owner but can trigger fallback (if rate_limit is in the VK’s fallback.on). A Retry-After from upstream is surfaced to the client when no fallback remains.
Observability
Every request emits a LangWatch trace with attributes:langwatch.vk_id,langwatch.project_id,langwatch.team_id,langwatch.org_id,langwatch.principal_id.langwatch.model_requested(what the client sent) vslangwatch.model_resolved(provider + model after alias).gen_ai.usage.cache_read.input_tokens/gen_ai.usage.cache_creation.input_tokens(OTel GenAI semconv; cache economics).langwatch.cost_usd(computed from tokens × price).langwatch.fallback.attemptspans (one per attempt;attempt=0is primary).
attr.langwatch.model_resolved to see how many requests actually hit Azure vs OpenAI, for example.