Skip to main content
The fastest-growing surface for enterprise LLM usage in 2026 is coding CLIs: every engineer has at least one running locally, and the gap between personal and corporate use has become a governance crisis. The AI Gateway solves it, every engineer keeps their preferred CLI, the org controls cost, guardrails, and visibility. Setup is always the same two steps:
  1. Mint a LangWatch virtual key (see Quickstart).
  2. Set the CLI’s base URL and API key to the gateway + VK.
The gateway exposes both OpenAI-compatible and Anthropic-compatible endpoints on the same port, so any CLI that speaks either dialect works unchanged.
Debugging a CLI sign-in or fold pipeline? See AI Governance → CLI Debug: langwatch governance status, langwatch ingest tail, the langwatch login error catalog (incl. 409 no_default_routing_policy), and OCSF probe.

At a glance

The langwatch CLI ships wrappers for the 4 most common coding CLIs, they auto-inject the right env vars from your governance config so you don’t have to. Native env-var setup still works everywhere if you prefer.
CLIWrapperEndpointNative env varsNotes
Claude Codelangwatch claude/v1/messagesANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKENTool-call deltas preserved byte-for-byte. Pin to dated model name (e.g. claude-haiku-4-5-20251001).
Codex CLIlangwatch codex/v1/responses~/.codex/config.toml model_providers, wire_api = "responses"Codex 0.122+ requires wire_api = "responses" (chat dropped). Pin model to a Bifrost-registered name.
Gemini CLIlangwatch gemini/v1beta/models/{model}:generateContentGOOGLE_GEMINI_BASE_URL, GEMINI_API_KEYNative Gemini API passthrough. Works with all Gemini models supported by your VK’s Vertex/Gemini credential.
Cursorlangwatch cursor */v1/chat/completionsCustom “OpenAI API base URL” in settings* Cursor is a GUI app, its AI panel reads from settings, not env vars, so the wrapper is mostly cosmetic for terminal launches. The canonical setup is the in-app paste. Agent mode benefits most from budgets.
opencode,/v1/chat/completions (or /v1/messages)Per-provider config in opencode.jsonNo wrapper today, set the per-provider config explicitly. Pin opencode 1.13.x, 1.14.x has a known regression with custom providers.
Aider,/v1/chat/completionsOPENAI_API_BASE, OPENAI_API_KEYNo wrapper today, manual env-var path. Confirmed working with model aliases.

Why this matters for the enterprise

Before the gateway, governance of coding CLIs was a choice between:
  • Ban them: kills productivity, drives shadow usage.
  • Allow them with personal provider keys: no visibility, no cost control, leaked credentials in dotfiles and CI logs.
The gateway adds a third option: allow every CLI, centrally governed.
  • Cost. Each engineer’s CLI spend debits the org → team → project → principal budgets you’ve set.
  • Visibility. Every CLI call shows up in LangWatch traces, scoped to the project the VK belongs to.
  • Policy. policy_rules can deny shell-exec tools or untrusted MCPs at the gateway level, even if the CLI would otherwise enable them.
  • Portability. An engineer on Claude Code and a co-worker on Codex hit the same gateway with different VKs but the same budget, you don’t need to pick a winner.
  • Revocation. Rotate or revoke a VK and the CLI stops working globally within 60 seconds. No more “which laptops still have the old key?”
A workable pattern used by several early customers:
  • One personal-access VK per engineer ({engineer}-cli) bound to the engineer’s principal.
  • Attach a principal-scoped monthly budget (e.g. $200/month for engineers, $1000/month for staff+). on_breach: block.
  • policy_rules.tools.deny: ^shell\\.exec$, ^filesystem\\.write$ (or your org’s list).
  • Fallback chain: Anthropic → OpenAI → Azure OpenAI. CLI autoswitches on outage.
  • cache.mode: respect so Anthropic prompt caching keeps saving 90%.
Then every engineer gets a one-time setup (env vars in their shell rc) and never touches provider keys again.

Real-time feedback

Each CLI’s trace lands in LangWatch live. You can:
  • Pin a filter “where langwatch.vk.tags contains cli” on the project dashboard.
  • Page on-call if any engineer crosses 80% of their monthly personal cap.
  • Run an offline eval comparing Claude Code vs Codex quality on tickets of a given type.
See the per-CLI pages for exact setup commands.

Verified smoke output

Lane A ran the gateway locally against pnpm dev on 2026-04-19 to confirm the response shape CLIs will see. Pinning the transcripts here so integrators can diff their actual output against known-good. Start the gateway pointed at a running LangWatch control plane on :5560:
cd services/gateway
GATEWAY_CONTROL_PLANE_URL=http://localhost:5560 \
  GATEWAY_ALLOW_INSECURE=1 \
  go run ./cmd/gateway

/healthz, always 200 once the process is up

HTTP/1.1 200 OK
Content-Type: application/json
X-Langwatch-Gateway-Version: dev
X-Langwatch-Request-Id: req_f5e4fd9f33a8861af4ab328aa00c45

{"status":"ok","version":"dev","uptime_s":0.988610959}
Kubernetes liveness probe target. X-Langwatch-Gateway-Version is set from the binary’s main.Version build-arg, production deploys carry the commit SHA so operators can answer “which pod served this” straight from the response.

/v1/models and /v1/chat/completions with no auth, 401 OpenAI-compat envelope

HTTP/1.1 401 Unauthorized
Content-Type: application/json
Traceparent: 00-e8aeda507fb93fafd3d2c20bbef283d6-53e32e8f6030ef05-01
X-Langwatch-Gateway-Version: dev
X-Langwatch-Request-Id: req_8f1250b94a959c1b2502792c757264
X-Langwatch-Span-Id: 53e32e8f6030ef05
X-Langwatch-Trace-Id: e8aeda507fb93fafd3d2c20bbef283d6

{"error":{"type":"invalid_api_key","code":"missing_api_key","message":"missing API key; supply Authorization: Bearer vk-lw-... or x-api-key"}}
The error message names both accepted auth headers, Authorization: Bearer vk-lw-... (OpenAI SDK, Claude Code, Cursor, Aider) AND x-api-key (Anthropic SDK). Either works against either endpoint. Traceparent + X-Langwatch-Span-Id + X-Langwatch-Trace-Id are present even on unauth 401s, observability of probe-abuse, misconfigured CLIs is available without inspecting access logs.

/startupz and /readyz behavior at cold boot

# /startupz, 503 only during the brief window before MarkStarted fires
# (initialisers + optional GATEWAY_STARTUP_NETCHECK_HOSTS probe), then 200
HTTP/1.1 503 Service Unavailable
{"status":"degraded","checks":{"control_plane_reachable":"ok"}}
/startupz and /readyz go to 200 as soon as the gateway has finished its startup initialisers and (if configured) the network-check probe has succeeded. They do NOT block on the auth cache observing a VK, a cold pod with no traffic and a fresh control-plane install with zero VKs will still go ready, and auth resolution happens on demand at request time.

Graceful drain on SIGTERM

INFO msg="gateway_draining" pre_drain_wait="5s"
INFO msg="gateway_shutting_down" timeout="15s"
INFO msg="gateway_stopped"
Matches the 4-phase drain documented in self-hosting/helm § Graceful drain. If your CLI sees one of these response shapes, the gateway is healthy. End-to-end completion through a real VK + provider requires the per-CLI config on each integration page.