Coding CLI Integrations

The fastest-growing surface for enterprise LLM usage in 2026 is coding CLIs: every engineer has at least one running locally, and the gap between personal and corporate use has become a governance crisis. The AI Gateway solves it — every engineer keeps their preferred CLI, the org controls cost, guardrails, and visibility. Setup is always the same two steps:

Mint a LangWatch virtual key (see Quickstart).
Set the CLI’s base URL and API key to the gateway + VK.

The gateway exposes both OpenAI-compatible and Anthropic-compatible endpoints on the same port, so any CLI that speaks either dialect works unchanged.

At a glance

CLI	Endpoint used	Env vars	Notes
Claude Code	`/v1/messages`	`ANTHROPIC_BASE_URL`, `ANTHROPIC_AUTH_TOKEN`	Tool-call deltas preserved byte-for-byte. Pin to dated model name (e.g. `claude-haiku-4-5-20251001`).
Codex CLI	`/v1/responses`	`~/.codex/config.toml` `model_providers`, `wire_api = "responses"`	Codex 0.122+ requires `wire_api = "responses"` (chat dropped). Pin model to a Bifrost-registered name.
Gemini CLI	`/v1beta/models/{model}:generateContent`	`GOOGLE_GEMINI_BASE_URL`, `GEMINI_API_KEY`	Native Gemini API passthrough. Works with all Gemini models supported by your VK’s Vertex/Gemini credential.
opencode	`/v1/chat/completions` (or `/v1/messages`)	Per-provider config in `opencode.json`	Pin opencode `1.13.x` — `1.14.x` has a known regression with custom providers.
Cursor	`/v1/chat/completions`	Custom “OpenAI API base URL” in settings	Agent mode benefits most from budgets.
Aider	`/v1/chat/completions`	`OPENAI_API_BASE`, `OPENAI_API_KEY`	Confirmed working with model aliases.

Why this matters for the enterprise

Before the gateway, governance of coding CLIs was a choice between:

Ban them — kills productivity, drives shadow usage.
Allow them with personal provider keys — no visibility, no cost control, leaked credentials in dotfiles and CI logs.

The gateway adds a third option: allow every CLI, centrally governed.

Cost. Each engineer’s CLI spend debits the org → team → project → principal budgets you’ve set.
Visibility. Every CLI call shows up in LangWatch traces, scoped to the project the VK belongs to.
Policy. policy_rules can deny shell-exec tools or untrusted MCPs at the gateway level, even if the CLI would otherwise enable them.
Portability. An engineer on Claude Code and a co-worker on Codex hit the same gateway with different VKs but the same budget — you don’t need to pick a winner.
Revocation. Rotate or revoke a VK and the CLI stops working globally within 60 seconds. No more “which laptops still have the old key?”

Recommended VK layout for coding CLIs

A workable pattern used by several early customers:

One personal-access VK per engineer ({engineer}-cli) bound to the engineer’s principal.
Attach a principal-scoped monthly budget (e.g. $200/month for engineers, $1000/month for staff+). on_breach: block.
policy_rules.tools.deny: ^shell\\.exec$, ^filesystem\\.write$ (or your org’s list).
Fallback chain: Anthropic → OpenAI → Azure OpenAI. CLI autoswitches on outage.
cache.mode: respect so Anthropic prompt caching keeps saving 90%.

Then every engineer gets a one-time setup (env vars in their shell rc) and never touches provider keys again.

Real-time feedback

Each CLI’s trace lands in LangWatch live. You can:

Pin a filter “where langwatch.vk.tags contains cli” on the project dashboard.
Page on-call if any engineer crosses 80% of their monthly personal cap.
Run an offline eval comparing Claude Code vs Codex quality on tickets of a given type.

See the per-CLI pages for exact setup commands.

Verified smoke output

Lane A ran the gateway locally against pnpm dev on 2026-04-19 to confirm the response shape CLIs will see. Pinning the transcripts here so integrators can diff their actual output against known-good. Start the gateway pointed at a running LangWatch control plane on :5560:

cd services/gateway
GATEWAY_CONTROL_PLANE_URL=http://localhost:5560 \
  GATEWAY_ALLOW_INSECURE=1 \
  go run ./cmd/gateway

`/healthz` — always 200 once the process is up

HTTP/1.1 200 OK
Content-Type: application/json
X-Langwatch-Gateway-Version: dev
X-Langwatch-Request-Id: req_f5e4fd9f33a8861af4ab328aa00c45

{"status":"ok","version":"dev","uptime_s":0.988610959}

Kubernetes liveness probe target. X-Langwatch-Gateway-Version is set from the binary’s main.Version build-arg — production deploys carry the commit SHA so operators can answer “which pod served this” straight from the response.

`/v1/models` and `/v1/chat/completions` with no auth — 401 OpenAI-compat envelope

HTTP/1.1 401 Unauthorized
Content-Type: application/json
Traceparent: 00-e8aeda507fb93fafd3d2c20bbef283d6-53e32e8f6030ef05-01
X-Langwatch-Gateway-Version: dev
X-Langwatch-Request-Id: req_8f1250b94a959c1b2502792c757264
X-Langwatch-Span-Id: 53e32e8f6030ef05
X-Langwatch-Trace-Id: e8aeda507fb93fafd3d2c20bbef283d6

{"error":{"type":"invalid_api_key","code":"missing_api_key","message":"missing API key; supply Authorization: Bearer lw_vk_... or x-api-key"}}

The error message names both accepted auth headers — Authorization: Bearer lw_vk_... (OpenAI SDK, Claude Code, Cursor, Aider) AND x-api-key (Anthropic SDK). Either works against either endpoint. Traceparent + X-Langwatch-Span-Id + X-Langwatch-Trace-Id are present even on unauth 401s — observability of probe-abuse / misconfigured CLIs is available without inspecting access logs.

`/startupz` and `/readyz` behavior at cold boot

# /startupz — 503 only during the brief window before MarkStarted fires
# (initialisers + optional GATEWAY_STARTUP_NETCHECK_HOSTS probe), then 200
HTTP/1.1 503 Service Unavailable
{"status":"degraded","checks":{"control_plane_reachable":"ok"}}

/startupz and /readyz go to 200 as soon as the gateway has finished its startup initialisers and (if configured) the network-check probe has succeeded. They do NOT block on the auth cache observing a VK — a cold pod with no traffic and a fresh control-plane install with zero VKs will still go ready, and auth resolution happens on demand at request time.

Graceful drain on SIGTERM

INFO msg="gateway_draining" pre_drain_wait="5s"
INFO msg="gateway_shutting_down" timeout="15s"
INFO msg="gateway_stopped"

Matches the 4-phase drain documented in self-hosting/helm § Graceful drain. If your CLI sees one of these response shapes, the gateway is healthy. End-to-end completion through a real VK + provider requires the per-CLI config on each integration page.

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Coding CLI Integrations

At a glance

Why this matters for the enterprise

Recommended VK layout for coding CLIs

Real-time feedback

Verified smoke output

`/healthz` — always 200 once the process is up

`/v1/models` and `/v1/chat/completions` with no auth — 401 OpenAI-compat envelope

`/startupz` and `/readyz` behavior at cold boot

Graceful drain on SIGTERM

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Documentation Index

​At a glance

​Why this matters for the enterprise

​Recommended VK layout for coding CLIs

​Real-time feedback

​Verified smoke output

​/healthz — always 200 once the process is up

​/v1/models and /v1/chat/completions with no auth — 401 OpenAI-compat envelope

​/startupz and /readyz behavior at cold boot

​Graceful drain on SIGTERM

At a glance

Why this matters for the enterprise

Recommended VK layout for coding CLIs

Real-time feedback

Verified smoke output

`/healthz` — always 200 once the process is up

`/v1/models` and `/v1/chat/completions` with no auth — 401 OpenAI-compat envelope

`/startupz` and `/readyz` behavior at cold boot

Graceful drain on SIGTERM