/v1/embeddings.
Request
input accepts a single string or an array of strings. Token pricing is per-input-token summed across the array.
Response
OpenAI-shape. Additional LangWatch headers:Provider compatibility
| Provider | Supported | Models |
|---|---|---|
| OpenAI | ✅ | text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 |
| Azure OpenAI | ✅ | Same as OpenAI (via deployment names) |
| Bedrock | ✅ Titan | amazon.titan-embed-text-v2:0 |
| Vertex AI | ✅ Gemini embeddings | textembedding-gecko, gemini-embedding-001 |
| Gemini (AI Studio) | ✅ | text-embedding-004, gemini-embedding-001 |
| Anthropic | ❌ | No embeddings endpoint |
| Custom OpenAI-compatible | varies | depends on upstream |
Fallback and caching
Subject to the same fallback rules as chat (see Fallback Chains). Embedding models are less prone to provider outages than chat models but the chain is there if needed. Caching: deterministic, identical(model, input) tuples produce identical vectors, so aggressive semantic caching is feasible. V1 does not enable gateway-level caching for embeddings; use a local cache in your application layer.