Gemini via Google AI Studio (the “plain” Google API, distinct from Vertex) is the fastest way to get Gemini models behind the gateway. API-key auth, no GCP project setup.Documentation Index
Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Configure the provider credential
Under Settings → Model Providers:- Add provider → Google Gemini.
- Paste the AI Studio API key (starts with a long alphanumeric string).
- Save.
Supported models
gemini-2.5-flash— fast + cheap. Default recommendation for coding CLIs on a Gemini-first VK.gemini-2.5-pro— bigger, more capable.gemini-2.0-flash— prior generation.gemini-2.0-pro— prior generation.- Embedding models:
text-embedding-004,gemini-embedding-001.
model_aliases:
Supported endpoints
POST /v1/chat/completions— OpenAI-shape dispatched to Gemini’sgenerateContent/streamGenerateContentendpoint.POST /v1/embeddings— for Gemini embedding models.
/v1/messages equivalent; Anthropic-shape clients should use a VK that has Anthropic or Bedrock as primary.
Context caching
Gemini has an implicit context cache for prompts > 32k tokens on most models. The gateway forwards requests untouched; caching behaviour is entirely upstream-managed. For explicit caching, Gemini offers a separatecaches.create API that the gateway does not orchestrate in v1 — but clients can call Gemini directly to create a cache and reference it in subsequent gateway calls.
Known quirks
- Safety filters on by default. Gemini applies
safetySettingsthresholds that returnfinish_reason: SAFETYwhen content crosses the threshold. Bifrost/core surfaces this asprovider_errorwith the safety metadata in the OTel trace. - Rate limits. AI Studio’s free tier has aggressive rate limits; paid tier is unlocked via billing-attached Google account. A 429 triggers fallback if configured.
- Tool/function calling. Gemini has its own function-calling format that bifrost/core translates from OpenAI-shape tools. Tool-call streaming deltas are less mature on Gemini — expect byte-level differences vs OpenAI.
- Streaming chunks size. Gemini’s streaming emits larger chunks than OpenAI — fewer, bigger SSE frames. Gateway passes these through unchanged.
- No
systemrole. Gemini usessystemInstructionat the top level of the request. OpenAI-SDK clients that send asystemrole message get it rewritten by the translator.
AI Studio vs Vertex
Google has two paths to Gemini:- AI Studio (this page) — quickest, API-key auth, pay-as-you-go on a consumer-ish billing model.
- Vertex AI — GCP-native, IAM-governed, data-residency, enterprise SLAs. See Vertex.