Anthropic is one of the two core providers for the gateway (alongside OpenAI). TheDocumentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
/v1/messages endpoint is Anthropic-native and designed to be Claude-Code-compatible.
Configure the provider credential
Under Settings → Model Providers:- Click Add provider → Anthropic.
- Paste your Anthropic API key (starts
sk-ant-...). - Save.
Bind to a VK
Pick Anthropic as a primary or fallback on any VK. When a VK uses Anthropic as its primary, its clients should ideally call/v1/messages (Anthropic-native shape). But /v1/chat/completions also works — the gateway uses bifrost/core to translate the request shape to Anthropic’s Messages API and back.
Supported endpoints
| Gateway route | Upstream | Notes |
|---|---|---|
POST /v1/messages | POST /v1/messages | Native shape. Preferred for Claude Code, Anthropic SDK, any tool-using agent. |
POST /v1/chat/completions | POST /v1/messages (translated) | OpenAI SDKs work transparently via translation. |
POST /v1/embeddings | ❌ not supported | Anthropic has no embeddings endpoint. |
POST /v1/messages with stream: true | POST /v1/messages streaming | SSE pass-through byte-for-byte (tool-call deltas preserved). |
cache_control passthrough — the 90% discount
Anthropic supports prompt caching via cache_control blocks on content. Ephemeral (5-min TTL) and persistent (1-hour TTL) caches are both honoured.
The gateway’s hard invariant: cache_control blocks are forwarded byte-for-byte in cache.mode = respect (default). If you see cache-hit costs not matching expectations, first check X-LangWatch-Cache response header — miss means no cache was hit; hit with non-zero cache_read_input_tokens means the discount applied.
Usage reporting (in the response body and trace) separates three counters:
cache_read_input_tokens— tokens served from cache (priced at ~10%).cache_creation_input_tokens— tokens written to cache (priced at 125%).input_tokens— regular cold input tokens.
Known quirks
max_tokensrequired — Anthropic requiresmax_tokenson every Messages request. The gateway does not default-insert a value; a client that omits it gets the upstream 400 back.- System prompt structure — Anthropic Messages uses a top-level
systemfield (string or content blocks), not a message role. OpenAI-SDK clients that send asystemrole message get it rewritten into the top-level field by the translator. - Tool-call streaming deltas — Anthropic emits
content_block_deltaevents withinput_json_deltapartial JSON. The gateway proxies these byte-for-byte post-first-chunk so Claude Code (and any other agent) can reconstruct the tool call incrementally. thinking(extended-reasoning) blocks — available on Claude Opus / Sonnet extended-thinking models. Blocks appear in the response; pricing is separate. Thinking-token counts roll intolangwatch.usage.output_tokens+ debit the budget; a dedicatedlangwatch.reasoning_tokensspan attribute is a v1.1 observability follow-up.
Using via Claude Code
See Claude Code — setANTHROPIC_BASE_URL=https://gateway.langwatch.ai and ANTHROPIC_AUTH_TOKEN=lw_vk_live_… and it works.