The Codex CLI speaks the OpenAI Chat Completions and Responses APIs. Both are available from the LangWatch AI Gateway and Codex will work with either — set the two env vars and go.Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Setup
Using the Responses API
Codex auto-detects/v1/responses when available. The gateway exposes it at POST /v1/responses with OpenAI-equivalent shape. No config change needed.
Codex 0.122+ requires wire_api = "responses"
As of Codex 0.122, wire_api = "chat" is no longer supported (Error loading config.toml: 'wire_api = "chat"' is no longer supported). All custom providers must use wire_api = "responses". The gateway exposes POST /v1/responses to match.
Pin to a Bifrost-registered model name. Codex’s default
gpt-5.4 (and aliases like gpt-5-mini without a date suffix) currently resolve to “provider is required” via the gateway because Bifrost’s model registry only matches concrete names. gpt-4o-mini works today; gpt-5 family pins like gpt-5-2025-08-07 work as Bifrost adds them.Cross-provider routing (Anthropic via /v1/messages, Gemini via /v1/beta)
Codex’s Responses-API wire shape is OpenAI-only by spec — it doesn’t carry Anthropic’scache_control blocks or Gemini’s cached_content field cleanly. To use Anthropic models from Codex you’d point Codex at the gateway with an OpenAI-shape body and let the gateway translate (chat-completions → Anthropic), at the cost of losing native cache_control. The recommended pattern is to use the right CLI per provider:
- OpenAI / Azure OpenAI → Codex via
/v1/responses(this page) - Anthropic / Bedrock Claude / Vertex Claude → Claude Code via
/v1/messages(see Claude Code) - Google Gemini / Vertex Gemini → gemini-cli via
/v1beta/models/…:generateContent(see Gemini CLI)
Self-hosted gateway
Model routing through aliases
Codex’s--model flag maps directly to the model field the gateway sees. Use the VK’s model_aliases to decouple the CLI from provider-specific names.
Example — engineers type gpt-4o, the VK routes to Azure:
codex exec --model gpt-4o "ping" lands on Azure OpenAI using the VK’s pinned Azure credential, while --model gpt-5-mini stays on OpenAI. Switching the entire engineering team from Azure to OpenAI is a one-line edit on the VK, no rollout.
Governance recipes
Only allow safe models
VKmodels_allowed: ["gpt-5-mini", "gpt-4o", "o3"]
Any Codex call with an off-list model returns 403 model_not_allowed. The VK owner controls the list; engineers can’t escape the allowlist by passing a different --model flag.
Disable cache to test cold-run cost
Per-session:OpenAI CLI / SDK support for custom headers varies; the header-injection method above is pseudo-code for illustration. Practical override: add
cache.mode: disable to a dedicated “cold-benchmark” VK.Team-scoped daily budget
- Scope:
team, target: engineering team. - Window:
day, limit:$100. on_breach:warnat 80%,blockat 100%.
Troubleshooting
401 invalid_api_key— wrong VK or VK revoked. Check the first 12 chars match a live VK in the UI.403 model_not_allowed— VK doesn’t list that model. Extendmodels_allowedor use an alias.402 budget_exceeded— a scope you belong to has hit its hard cap. Check the errormessagefield for which scope.- Intermittent latency spikes — open the LangWatch trace; check
X-LangWatch-Fallback-Count> 0 (primary provider is flaky) orX-LangWatch-Cache: misson calls you expected to hit.
Known good setup
The combination that’s been most-tested with Codex on v1:- VK
cache.mode: respect. - VK
fallback.on: [5xx, timeout, 429], chain:openai-primary → anthropic-fallback. - VK
models_allowed: ["gpt-5-mini", "gpt-5", "gpt-4o", "claude-haiku-4-5-20251001"]. - Personal-access VK bound to the engineer.
- Monthly
principalbudget withwarnat 80%,blockat 100%.