OpenAI is the most straightforward provider to bind to the gateway. Any valid OpenAI API key in your LangWatch Model Providers table can be referenced by a virtual key and consumed viaDocumentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
/v1/chat/completions, /v1/embeddings, /v1/responses, /v1/images/generations, /v1/audio/transcriptions, and /v1/audio/speech.
Configure the provider credential
Under Settings → Model Providers in the LangWatch app:- Click Add provider → OpenAI.
- Paste your OpenAI API key (starts
sk-...). - Optionally set a custom organisation ID via the Organization field.
- Save.
ModelProvider row that any VK in the same project can bind to. The same credential also powers the existing litellm / playground / evaluators path — no duplication.
Bind the credential to a VK
When creating or editing a VK, select the OpenAI credential from the Primary provider dropdown. Optionally add it again (or a second OpenAI key) as a fallback. Per-VK overrides available on the binding:- Rate limits — per-VK
rpm,tpm,rpdenforced at the gateway before the upstream call. - Extra headers — appended to every request (e.g.
OpenAI-Beta: assistants=v2). - Rotation policy — when a credential has multiple API keys (comma-separated), the gateway can rotate them round-robin or on rate-limit.
Supported endpoints
| Gateway route | Upstream | Notes |
|---|---|---|
POST /v1/chat/completions | POST /v1/chat/completions | Streaming and non-streaming both supported. |
POST /v1/responses | POST /v1/responses | Reasoning models (o3, o4-mini) and tool use. |
POST /v1/embeddings | POST /v1/embeddings | text-embedding-3-small, text-embedding-3-large, ada. |
POST /v1/images/generations | POST /v1/images/generations | DALL-E 3. |
POST /v1/audio/transcriptions | POST /v1/audio/transcriptions | Whisper. |
POST /v1/audio/speech | POST /v1/audio/speech | TTS. |
POST /v1/moderations | POST /v1/moderations | Content moderation. |
GET /v1/models | GET /v1/models | Filtered by the VK’s models_allowed. |
Caching
OpenAI automatically caches prompt prefixes ≥ 1024 tokens in most GPT-4 / GPT-5 / o-series models. There’s nocache_control block to preserve — OpenAI just handles it. The gateway forwards requests untouched in mode=respect, so cache hits are observed naturally.
The usage.prompt_tokens_details.cached_tokens field in the response is populated on cache hits and mirrored into the trace as gen_ai.usage.cache_read.input_tokens (OTel GenAI semconv). OpenAI has no write-to-cache dimension, so gen_ai.usage.cache_creation.input_tokens is unset.
Reasoning tokens (o-series)
Reasoning models return areasoning_tokens count in usage.completion_tokens_details. The gateway forwards this verbatim; it’s also recorded in the LangWatch trace for cost analysis and included in budget debit.
Known quirks
- Responses API vs Chat Completions API — the two have slightly different payload shapes. The gateway proxies whichever endpoint the client hits; it does not translate between them. Codex users should see Codex CLI for guidance on
wire_apiconfig. - Organization header on egress — if the VK doesn’t set a custom Organization, the upstream OpenAI request uses the ModelProvider’s default.
- Rate-limit responses (429) — OpenAI’s 429 includes a
Retry-Afterheader the gateway surfaces to the client when there is no fallback, or uses as a signal to trigger fallback iffallback.onincludesrate_limit.