The fastest-growing surface for enterprise LLM usage in 2026 is coding CLIs: every engineer has at least one running locally, and the gap between personal and corporate use has become a governance crisis. The AI Gateway solves it — every engineer keeps their preferred CLI, the org controls cost, guardrails, and visibility. Setup is always the same two steps:Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Mint a LangWatch virtual key (see Quickstart).
- Set the CLI’s base URL and API key to the gateway + VK.
At a glance
| CLI | Endpoint used | Env vars | Notes |
|---|---|---|---|
| Claude Code | /v1/messages | ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN | Tool-call deltas preserved byte-for-byte. Pin to dated model name (e.g. claude-haiku-4-5-20251001). |
| Codex CLI | /v1/responses | ~/.codex/config.toml model_providers, wire_api = "responses" | Codex 0.122+ requires wire_api = "responses" (chat dropped). Pin model to a Bifrost-registered name. |
| Gemini CLI | /v1beta/models/{model}:generateContent | GOOGLE_GEMINI_BASE_URL, GEMINI_API_KEY | Native Gemini API passthrough. Works with all Gemini models supported by your VK’s Vertex/Gemini credential. |
| opencode | /v1/chat/completions (or /v1/messages) | Per-provider config in opencode.json | Pin opencode 1.13.x — 1.14.x has a known regression with custom providers. |
| Cursor | /v1/chat/completions | Custom “OpenAI API base URL” in settings | Agent mode benefits most from budgets. |
| Aider | /v1/chat/completions | OPENAI_API_BASE, OPENAI_API_KEY | Confirmed working with model aliases. |
Why this matters for the enterprise
Before the gateway, governance of coding CLIs was a choice between:- Ban them — kills productivity, drives shadow usage.
- Allow them with personal provider keys — no visibility, no cost control, leaked credentials in dotfiles and CI logs.
- Cost. Each engineer’s CLI spend debits the org → team → project → principal budgets you’ve set.
- Visibility. Every CLI call shows up in LangWatch traces, scoped to the project the VK belongs to.
- Policy.
policy_rulescan deny shell-exec tools or untrusted MCPs at the gateway level, even if the CLI would otherwise enable them. - Portability. An engineer on Claude Code and a co-worker on Codex hit the same gateway with different VKs but the same budget — you don’t need to pick a winner.
- Revocation. Rotate or revoke a VK and the CLI stops working globally within 60 seconds. No more “which laptops still have the old key?”
Recommended VK layout for coding CLIs
A workable pattern used by several early customers:- One personal-access VK per engineer (
{engineer}-cli) bound to the engineer’s principal. - Attach a
principal-scoped monthly budget (e.g.$200/monthfor engineers,$1000/monthfor staff+).on_breach: block. policy_rules.tools.deny:^shell\\.exec$,^filesystem\\.write$(or your org’s list).- Fallback chain: Anthropic → OpenAI → Azure OpenAI. CLI autoswitches on outage.
cache.mode: respectso Anthropic prompt caching keeps saving 90%.
Real-time feedback
Each CLI’s trace lands in LangWatch live. You can:- Pin a filter “where
langwatch.vk.tagscontainscli” on the project dashboard. - Page on-call if any engineer crosses 80% of their monthly personal cap.
- Run an offline eval comparing Claude Code vs Codex quality on tickets of a given type.
Verified smoke output
Lane A ran the gateway locally againstpnpm dev on 2026-04-19 to confirm the response shape CLIs will see. Pinning the transcripts here so integrators can diff their actual output against known-good.
Start the gateway pointed at a running LangWatch control plane on :5560:
/healthz — always 200 once the process is up
X-Langwatch-Gateway-Version is set from the binary’s main.Version build-arg — production deploys carry the commit SHA so operators can answer “which pod served this” straight from the response.
/v1/models and /v1/chat/completions with no auth — 401 OpenAI-compat envelope
Authorization: Bearer lw_vk_... (OpenAI SDK, Claude Code, Cursor, Aider) AND x-api-key (Anthropic SDK). Either works against either endpoint.
Traceparent + X-Langwatch-Span-Id + X-Langwatch-Trace-Id are present even on unauth 401s — observability of probe-abuse / misconfigured CLIs is available without inspecting access logs.
/startupz and /readyz behavior at cold boot
/startupz and /readyz go to 200 as soon as the gateway has finished its startup initialisers and (if configured) the network-check probe has succeeded. They do NOT block on the auth cache observing a VK — a cold pod with no traffic and a fresh control-plane install with zero VKs will still go ready, and auth resolution happens on demand at request time.