~11 µs of gateway-side overhead at 5k RPS sustained. Written in Go with chi + in-process JWT verification. The hot path sits in front of every LLM call, it has to be close to free.
Try it in 30 seconds
Swap your OpenAIbase_url + api_key and you’re done. Don’t have a virtual key yet? Two paths:
- Developer using a coding CLI (Claude Code, Codex, Cursor, Gemini): run
langwatch login --deviceand a personal VK is auto-issued, see Personal IDE keys. - Service, app traffic (server-to-server): mint one in the LangWatch app under AI Gateway → Virtual Keys: see the Quickstart.
Why an AI Gateway?
Most teams start with “one app, one provider API key, one language SDK.” That stops working the moment a second team, a second app, or a second model provider shows up:- Credentials proliferate (developer laptops, CI environments, Kubernetes secrets) with no revocation story.
- Cost governance becomes guesswork, no one owns the team-level or project-level budget.
- Observability is fragmented, each app emits traces differently (or not at all).
- Reliability suffers, one provider outage takes down every surface that hard-coded that provider.
gateway.langwatch.ai (or your self-hosted URL) carrying a LangWatch virtual key. From the application’s perspective, nothing else changes, the OpenAI SDK still works, the Anthropic SDK still works, Claude Code and Codex still work, but every request now carries policy, budget enforcement, and observability.
What it gives you
- OpenAI-compatible endpoint at
/v1/chat/completions,/v1/embeddings,/v1/models, etc. Drop inOPENAI_BASE_URL=https://gateway.langwatch.ai/v1andOPENAI_API_KEY=vk-lw-…and every existing OpenAI-SDK codebase works unchanged. - Anthropic-compatible endpoint at
/v1/messagesso Claude Code and native Anthropic SDKs work by settingANTHROPIC_BASE_URLandANTHROPIC_AUTH_TOKENto your gateway URL and a LangWatch virtual key. - Virtual keys: LangWatch-issued credentials (
vk-lw-<ulid>) with project/team/org scope. Show-once secrets, peppered-HMAC-SHA256 hashed, rotatable, revokable within 60 seconds. - Hierarchical budgets: enforce spend limits at organisation, team, project, virtual-key, or principal scope with windowed periods (minute → total) and soft-warn or hard-block semantics.
- Per-request guardrails: invoke your LangWatch evaluators inline on the request, response, or each streaming chunk. Decisions:
allow,block,modify(redact). - Tool, MCP, URL policy: block or allowlist tool calls, MCP server references, or outbound URLs by regex before the request ever leaves your gateway.
- Caching passthrough: Anthropic
cache_controlmarkers are forwarded byte-for-byte by default (protecting your 90% cache discount). Override per-request withX-LangWatch-Cache: respect|force|disable. - Automatic fallback: per-virtual-key fallback chain (e.g. OpenAI → Anthropic → Bedrock) triggers on 5xx, timeout, 429, or circuit-breaker-open. 400/401 client errors never trigger fallback.
- Per-tenant observability: every request emits a LangWatch trace to the owning project with
langwatch.virtual_key_id,langwatch.project_id,langwatch.principal_id,langwatch.cost_usd, cache-hit token breakdown, and status. Shared multi-tenant gateway, per-tenant trace routing. - Coding-CLI integrations: preserves the tool-call streaming deltas that Claude Code and Codex depend on. Documented setup for the major coding assistants in CLI Integrations.
- Self-hostable: ships as a Helm chart alongside the LangWatch app. Horizontally scales, serves traffic during control-plane outages (with bootstrap mode).
Architecture at a glance
github.com/maximhq/bifrost/core as a library for provider-specific dispatch (Azure deployment name quirks, Bedrock regional inference profiles, Vertex OAuth refresh, streaming format divergences across providers). LangWatch owns everything above the dispatch line: auth, policy, budgets, guardrails, per-tenant observability.
Why a separate Go service? Sub-millisecond overhead on the hot path matters when the gateway sits in front of every LLM call. Go with chi + in-process JWT verification gives us ~11 µs gateway-side overhead at 5k RPS sustained. A Python/FastAPI or Node/Hono gateway cannot reach those numbers.
Why embed, not fork? Bifrost’s provider dispatch is where hundreds of engineer-hours per provider have been invested (native SDKs for every provider, streaming format normalisation, tool-call delta reassembly, reasoning-token handling). That’s the part we consume as a library. The policy, governance, observability layers on top are ours, the product.
When to use the gateway vs. the LangWatch SDK directly
Both ship in LangWatch. They solve different problems:| Need | Use |
|---|---|
| ”Trace my existing provider calls without changing credentials” | LangWatch SDK: point it at your app, it instruments. Keep your provider keys. |
| ”Give the marketing team an LLM key they can’t leak, with a $500/month cap” | AI Gateway: mint a virtual key with a budget, share the VK. |
| ”Switch all internal apps from OpenAI to Anthropic for a week” | AI Gateway: flip model_aliases on the VKs involved. Zero code change. |
| ”Ship a coding CLI to every engineer with observability” | AI Gateway + CLI integration: set OPENAI_BASE_URL or ANTHROPIC_BASE_URL on every dev machine. |
| ”Guardrail the production chat surface for PII and prompt injection” | AI Gateway guardrail attachments, inline without touching application code. |
| ”Call 50+ providers from a single eval script” | LangWatch SDK’s existing litellm path still works unchanged for playground/evaluators (not migrated to gateway in v1). |
Next steps
- Quickstart: five minutes to your first gateway request.
- Concepts: virtual keys, budgets, fallback, guardrails, caching.
- CLI Integrations: Claude Code, Codex, opencode, Cursor, Aider.
- Self-Hosting: ship the gateway with your LangWatch Helm chart.
- AI Governance: once the gateway is in place, layer on org-wide controls: anomaly detection, OCSF/SIEM export, per-user workspaces, IngestionSources for non-gateway telemetry.