Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

The fastest-growing surface for enterprise LLM usage in 2026 is coding CLIs: every engineer has at least one running locally, and the gap between personal and corporate use has become a governance crisis. The AI Gateway solves it — every engineer keeps their preferred CLI, the org controls cost, guardrails, and visibility. Setup is always the same two steps:
  1. Mint a LangWatch virtual key (see Quickstart).
  2. Set the CLI’s base URL and API key to the gateway + VK.
The gateway exposes both OpenAI-compatible and Anthropic-compatible endpoints on the same port, so any CLI that speaks either dialect works unchanged.

At a glance

CLIEndpoint usedEnv varsNotes
Claude Code/v1/messagesANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKENTool-call deltas preserved byte-for-byte. Pin to dated model name (e.g. claude-haiku-4-5-20251001).
Codex CLI/v1/responses~/.codex/config.toml model_providers, wire_api = "responses"Codex 0.122+ requires wire_api = "responses" (chat dropped). Pin model to a Bifrost-registered name.
Gemini CLI/v1beta/models/{model}:generateContentGOOGLE_GEMINI_BASE_URL, GEMINI_API_KEYNative Gemini API passthrough. Works with all Gemini models supported by your VK’s Vertex/Gemini credential.
opencode/v1/chat/completions (or /v1/messages)Per-provider config in opencode.jsonPin opencode 1.13.x1.14.x has a known regression with custom providers.
Cursor/v1/chat/completionsCustom “OpenAI API base URL” in settingsAgent mode benefits most from budgets.
Aider/v1/chat/completionsOPENAI_API_BASE, OPENAI_API_KEYConfirmed working with model aliases.

Why this matters for the enterprise

Before the gateway, governance of coding CLIs was a choice between:
  • Ban them — kills productivity, drives shadow usage.
  • Allow them with personal provider keys — no visibility, no cost control, leaked credentials in dotfiles and CI logs.
The gateway adds a third option: allow every CLI, centrally governed.
  • Cost. Each engineer’s CLI spend debits the org → team → project → principal budgets you’ve set.
  • Visibility. Every CLI call shows up in LangWatch traces, scoped to the project the VK belongs to.
  • Policy. policy_rules can deny shell-exec tools or untrusted MCPs at the gateway level, even if the CLI would otherwise enable them.
  • Portability. An engineer on Claude Code and a co-worker on Codex hit the same gateway with different VKs but the same budget — you don’t need to pick a winner.
  • Revocation. Rotate or revoke a VK and the CLI stops working globally within 60 seconds. No more “which laptops still have the old key?”
A workable pattern used by several early customers:
  • One personal-access VK per engineer ({engineer}-cli) bound to the engineer’s principal.
  • Attach a principal-scoped monthly budget (e.g. $200/month for engineers, $1000/month for staff+). on_breach: block.
  • policy_rules.tools.deny: ^shell\\.exec$, ^filesystem\\.write$ (or your org’s list).
  • Fallback chain: Anthropic → OpenAI → Azure OpenAI. CLI autoswitches on outage.
  • cache.mode: respect so Anthropic prompt caching keeps saving 90%.
Then every engineer gets a one-time setup (env vars in their shell rc) and never touches provider keys again.

Real-time feedback

Each CLI’s trace lands in LangWatch live. You can:
  • Pin a filter “where langwatch.vk.tags contains cli” on the project dashboard.
  • Page on-call if any engineer crosses 80% of their monthly personal cap.
  • Run an offline eval comparing Claude Code vs Codex quality on tickets of a given type.
See the per-CLI pages for exact setup commands.

Verified smoke output

Lane A ran the gateway locally against pnpm dev on 2026-04-19 to confirm the response shape CLIs will see. Pinning the transcripts here so integrators can diff their actual output against known-good. Start the gateway pointed at a running LangWatch control plane on :5560:
cd services/gateway
GATEWAY_CONTROL_PLANE_URL=http://localhost:5560 \
  GATEWAY_ALLOW_INSECURE=1 \
  go run ./cmd/gateway

/healthz — always 200 once the process is up

HTTP/1.1 200 OK
Content-Type: application/json
X-Langwatch-Gateway-Version: dev
X-Langwatch-Request-Id: req_f5e4fd9f33a8861af4ab328aa00c45

{"status":"ok","version":"dev","uptime_s":0.988610959}
Kubernetes liveness probe target. X-Langwatch-Gateway-Version is set from the binary’s main.Version build-arg — production deploys carry the commit SHA so operators can answer “which pod served this” straight from the response.

/v1/models and /v1/chat/completions with no auth — 401 OpenAI-compat envelope

HTTP/1.1 401 Unauthorized
Content-Type: application/json
Traceparent: 00-e8aeda507fb93fafd3d2c20bbef283d6-53e32e8f6030ef05-01
X-Langwatch-Gateway-Version: dev
X-Langwatch-Request-Id: req_8f1250b94a959c1b2502792c757264
X-Langwatch-Span-Id: 53e32e8f6030ef05
X-Langwatch-Trace-Id: e8aeda507fb93fafd3d2c20bbef283d6

{"error":{"type":"invalid_api_key","code":"missing_api_key","message":"missing API key; supply Authorization: Bearer lw_vk_... or x-api-key"}}
The error message names both accepted auth headers — Authorization: Bearer lw_vk_... (OpenAI SDK, Claude Code, Cursor, Aider) AND x-api-key (Anthropic SDK). Either works against either endpoint. Traceparent + X-Langwatch-Span-Id + X-Langwatch-Trace-Id are present even on unauth 401s — observability of probe-abuse / misconfigured CLIs is available without inspecting access logs.

/startupz and /readyz behavior at cold boot

# /startupz — 503 only during the brief window before MarkStarted fires
# (initialisers + optional GATEWAY_STARTUP_NETCHECK_HOSTS probe), then 200
HTTP/1.1 503 Service Unavailable
{"status":"degraded","checks":{"control_plane_reachable":"ok"}}
/startupz and /readyz go to 200 as soon as the gateway has finished its startup initialisers and (if configured) the network-check probe has succeeded. They do NOT block on the auth cache observing a VK — a cold pod with no traffic and a fresh control-plane install with zero VKs will still go ready, and auth resolution happens on demand at request time.

Graceful drain on SIGTERM

INFO msg="gateway_draining" pre_drain_wait="5s"
INFO msg="gateway_shutting_down" timeout="15s"
INFO msg="gateway_stopped"
Matches the 4-phase drain documented in self-hosting/helm § Graceful drain. If your CLI sees one of these response shapes, the gateway is healthy. End-to-end completion through a real VK + provider requires the per-CLI config on each integration page.