Codex CLI

The Codex CLI speaks the OpenAI Chat Completions and Responses APIs. Both are available from the LangWatch AI Gateway and Codex will work with either — set the two env vars and go.

Setup

export OPENAI_BASE_URL="https://gateway.langwatch.ai/v1"
export OPENAI_API_KEY="lw_vk_live_01HZX..."
codex exec "explain this repo"

That’s it. Every Codex call hits the gateway, which applies your VK’s policies (budget, guardrails, blocked patterns, fallback) and emits a LangWatch trace.

Using the Responses API

Codex auto-detects /v1/responses when available. The gateway exposes it at POST /v1/responses with OpenAI-equivalent shape. No config change needed.

Codex 0.122+ requires `wire_api = "responses"`

As of Codex 0.122, wire_api = "chat" is no longer supported (Error loading config.toml: 'wire_api = "chat"' is no longer supported). All custom providers must use wire_api = "responses". The gateway exposes POST /v1/responses to match.

# ~/.codex/config.toml
[model_providers.langwatch]
name = "LangWatch Gateway"
base_url = "https://gateway.langwatch.ai/v1"
env_key = "OPENAI_API_KEY"
wire_api = "responses"

model = "gpt-4o-mini"     # use a Bifrost-registered model name
model_provider = "langwatch"
model_reasoning_effort = "low"

Pin to a Bifrost-registered model name. Codex’s default gpt-5.4 (and aliases like gpt-5-mini without a date suffix) currently resolve to “provider is required” via the gateway because Bifrost’s model registry only matches concrete names. gpt-4o-mini works today; gpt-5 family pins like gpt-5-2025-08-07 work as Bifrost adds them.

Cross-provider routing (Anthropic via /v1/messages, Gemini via /v1/beta)

Codex’s Responses-API wire shape is OpenAI-only by spec — it doesn’t carry Anthropic’s cache_control blocks or Gemini’s cached_content field cleanly. To use Anthropic models from Codex you’d point Codex at the gateway with an OpenAI-shape body and let the gateway translate (chat-completions → Anthropic), at the cost of losing native cache_control. The recommended pattern is to use the right CLI per provider:

OpenAI / Azure OpenAI → Codex via /v1/responses (this page)
Anthropic / Bedrock Claude / Vertex Claude → Claude Code via /v1/messages (see Claude Code)
Google Gemini / Vertex Gemini → gemini-cli via /v1beta/models/…:generateContent (see Gemini CLI)

A single VK can route to multiple providers when paired with the right CLI per call shape.

Self-hosted gateway

export OPENAI_BASE_URL="https://langwatch-gateway.your-corp.internal/v1"

Model routing through aliases

Codex’s --model flag maps directly to the model field the gateway sees. Use the VK’s model_aliases to decouple the CLI from provider-specific names. Example — engineers type gpt-4o, the VK routes to Azure:

{
  "model_aliases": {
    "gpt-4o":   "azure/gpt-4o-eastus-deployment",
    "gpt-5-mini": "openai/gpt-5-mini"
  }
}

Now codex exec --model gpt-4o "ping" lands on Azure OpenAI using the VK’s pinned Azure credential, while --model gpt-5-mini stays on OpenAI. Switching the entire engineering team from Azure to OpenAI is a one-line edit on the VK, no rollout.

Governance recipes

Only allow safe models

VK models_allowed: ["gpt-5-mini", "gpt-4o", "o3"] Any Codex call with an off-list model returns 403 model_not_allowed. The VK owner controls the list; engineers can’t escape the allowlist by passing a different --model flag.

Disable cache to test cold-run cost

Per-session:

OPENAI_BASE_URL="https://gateway.langwatch.ai/v1" \
OPENAI_API_KEY="lw_vk_live_..." \
OPENAI_EXTRA_HEADERS='{"X-LangWatch-Cache":"disable"}' \
codex exec "benchmark"

OpenAI CLI / SDK support for custom headers varies; the header-injection method above is pseudo-code for illustration. Practical override: add cache.mode: disable to a dedicated “cold-benchmark” VK.

Team-scoped daily budget

Scope: team, target: engineering team.
Window: day, limit: $100.
on_breach: warn at 80%, block at 100%.

Every engineer’s Codex usage counts against the team budget. The team lead gets warning headers in traces when the team is at 80%, and hard-blocks for the last hour of the day.

Troubleshooting

401 invalid_api_key — wrong VK or VK revoked. Check the first 12 chars match a live VK in the UI.
403 model_not_allowed — VK doesn’t list that model. Extend models_allowed or use an alias.
402 budget_exceeded — a scope you belong to has hit its hard cap. Check the error message field for which scope.
Intermittent latency spikes — open the LangWatch trace; check X-LangWatch-Fallback-Count > 0 (primary provider is flaky) or X-LangWatch-Cache: miss on calls you expected to hit.

Known good setup

The combination that’s been most-tested with Codex on v1:

VK cache.mode: respect.
VK fallback.on: [5xx, timeout, 429], chain: openai-primary → anthropic-fallback.
VK models_allowed: ["gpt-5-mini", "gpt-5", "gpt-4o", "claude-haiku-4-5-20251001"].
Personal-access VK bound to the engineer.
Monthly principal budget with warn at 80%, block at 100%.

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Setup

Using the Responses API

Codex 0.122+ requires `wire_api = "responses"`

Cross-provider routing (Anthropic via /v1/messages, Gemini via /v1/beta)

Self-hosted gateway

Model routing through aliases

Governance recipes

Only allow safe models

Disable cache to test cold-run cost

Team-scoped daily budget

Troubleshooting

Known good setup

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Documentation Index

​Setup

​Using the Responses API

​Codex 0.122+ requires wire_api = "responses"

​Cross-provider routing (Anthropic via /v1/messages, Gemini via /v1/beta)

​Self-hosted gateway

​Model routing through aliases

​Governance recipes

​Only allow safe models

​Disable cache to test cold-run cost

​Team-scoped daily budget

​Troubleshooting

​Known good setup

Setup

Using the Responses API

Codex 0.122+ requires `wire_api = "responses"`

Cross-provider routing (Anthropic via /v1/messages, Gemini via /v1/beta)

Self-hosted gateway

Model routing through aliases

Governance recipes

Only allow safe models

Disable cache to test cold-run cost

Team-scoped daily budget

Troubleshooting

Known good setup