Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

OpenAI is the most straightforward provider to bind to the gateway. Any valid OpenAI API key in your LangWatch Model Providers table can be referenced by a virtual key and consumed via /v1/chat/completions, /v1/embeddings, /v1/responses, /v1/images/generations, /v1/audio/transcriptions, and /v1/audio/speech.

Configure the provider credential

Under Settings → Model Providers in the LangWatch app:
  1. Click Add providerOpenAI.
  2. Paste your OpenAI API key (starts sk-...).
  3. Optionally set a custom organisation ID via the Organization field.
  4. Save.
This creates a ModelProvider row that any VK in the same project can bind to. The same credential also powers the existing litellm / playground / evaluators path — no duplication.

Bind the credential to a VK

When creating or editing a VK, select the OpenAI credential from the Primary provider dropdown. Optionally add it again (or a second OpenAI key) as a fallback. Per-VK overrides available on the binding:
  • Rate limits — per-VK rpm, tpm, rpd enforced at the gateway before the upstream call.
  • Extra headers — appended to every request (e.g. OpenAI-Beta: assistants=v2).
  • Rotation policy — when a credential has multiple API keys (comma-separated), the gateway can rotate them round-robin or on rate-limit.

Supported endpoints

Gateway routeUpstreamNotes
POST /v1/chat/completionsPOST /v1/chat/completionsStreaming and non-streaming both supported.
POST /v1/responsesPOST /v1/responsesReasoning models (o3, o4-mini) and tool use.
POST /v1/embeddingsPOST /v1/embeddingstext-embedding-3-small, text-embedding-3-large, ada.
POST /v1/images/generationsPOST /v1/images/generationsDALL-E 3.
POST /v1/audio/transcriptionsPOST /v1/audio/transcriptionsWhisper.
POST /v1/audio/speechPOST /v1/audio/speechTTS.
POST /v1/moderationsPOST /v1/moderationsContent moderation.
GET /v1/modelsGET /v1/modelsFiltered by the VK’s models_allowed.

Caching

OpenAI automatically caches prompt prefixes ≥ 1024 tokens in most GPT-4 / GPT-5 / o-series models. There’s no cache_control block to preserve — OpenAI just handles it. The gateway forwards requests untouched in mode=respect, so cache hits are observed naturally. The usage.prompt_tokens_details.cached_tokens field in the response is populated on cache hits and mirrored into the trace as gen_ai.usage.cache_read.input_tokens (OTel GenAI semconv). OpenAI has no write-to-cache dimension, so gen_ai.usage.cache_creation.input_tokens is unset.

Reasoning tokens (o-series)

Reasoning models return a reasoning_tokens count in usage.completion_tokens_details. The gateway forwards this verbatim; it’s also recorded in the LangWatch trace for cost analysis and included in budget debit.

Known quirks

  • Responses API vs Chat Completions API — the two have slightly different payload shapes. The gateway proxies whichever endpoint the client hits; it does not translate between them. Codex users should see Codex CLI for guidance on wire_api config.
  • Organization header on egress — if the VK doesn’t set a custom Organization, the upstream OpenAI request uses the ModelProvider’s default.
  • Rate-limit responses (429) — OpenAI’s 429 includes a Retry-After header the gateway surfaces to the client when there is no fallback, or uses as a signal to trigger fallback if fallback.on includes rate_limit.