Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Anthropic is one of the two core providers for the gateway (alongside OpenAI). The /v1/messages endpoint is Anthropic-native and designed to be Claude-Code-compatible.

Configure the provider credential

Under Settings → Model Providers:
  1. Click Add providerAnthropic.
  2. Paste your Anthropic API key (starts sk-ant-...).
  3. Save.
The same credential serves the gateway, the evaluators’ litellm path, and any other place in LangWatch that needs an Anthropic call.

Bind to a VK

Pick Anthropic as a primary or fallback on any VK. When a VK uses Anthropic as its primary, its clients should ideally call /v1/messages (Anthropic-native shape). But /v1/chat/completions also works — the gateway uses bifrost/core to translate the request shape to Anthropic’s Messages API and back.

Supported endpoints

Gateway routeUpstreamNotes
POST /v1/messagesPOST /v1/messagesNative shape. Preferred for Claude Code, Anthropic SDK, any tool-using agent.
POST /v1/chat/completionsPOST /v1/messages (translated)OpenAI SDKs work transparently via translation.
POST /v1/embeddings❌ not supportedAnthropic has no embeddings endpoint.
POST /v1/messages with stream: truePOST /v1/messages streamingSSE pass-through byte-for-byte (tool-call deltas preserved).

cache_control passthrough — the 90% discount

Anthropic supports prompt caching via cache_control blocks on content. Ephemeral (5-min TTL) and persistent (1-hour TTL) caches are both honoured. The gateway’s hard invariant: cache_control blocks are forwarded byte-for-byte in cache.mode = respect (default). If you see cache-hit costs not matching expectations, first check X-LangWatch-Cache response header — miss means no cache was hit; hit with non-zero cache_read_input_tokens means the discount applied. Usage reporting (in the response body and trace) separates three counters:
  • cache_read_input_tokens — tokens served from cache (priced at ~10%).
  • cache_creation_input_tokens — tokens written to cache (priced at 125%).
  • input_tokens — regular cold input tokens.
The gateway sums these and computes per-request cost for the budget ledger. See Caching Passthrough for the full decision matrix and override headers.

Known quirks

  • max_tokens required — Anthropic requires max_tokens on every Messages request. The gateway does not default-insert a value; a client that omits it gets the upstream 400 back.
  • System prompt structure — Anthropic Messages uses a top-level system field (string or content blocks), not a message role. OpenAI-SDK clients that send a system role message get it rewritten into the top-level field by the translator.
  • Tool-call streaming deltas — Anthropic emits content_block_delta events with input_json_delta partial JSON. The gateway proxies these byte-for-byte post-first-chunk so Claude Code (and any other agent) can reconstruct the tool call incrementally.
  • thinking (extended-reasoning) blocks — available on Claude Opus / Sonnet extended-thinking models. Blocks appear in the response; pricing is separate. Thinking-token counts roll into langwatch.usage.output_tokens + debit the budget; a dedicated langwatch.reasoning_tokens span attribute is a v1.1 observability follow-up.

Using via Claude Code

See Claude Code — set ANTHROPIC_BASE_URL=https://gateway.langwatch.ai and ANTHROPIC_AUTH_TOKEN=lw_vk_live_… and it works.