Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langwatch.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Gemini via Google AI Studio (the “plain” Google API, distinct from Vertex) is the fastest way to get Gemini models behind the gateway. API-key auth, no GCP project setup.

Configure the provider credential

Under Settings → Model Providers:
  1. Add provider → Google Gemini.
  2. Paste the AI Studio API key (starts with a long alphanumeric string).
  3. Save.

Supported models

  • gemini-2.5-flash — fast + cheap. Default recommendation for coding CLIs on a Gemini-first VK.
  • gemini-2.5-pro — bigger, more capable.
  • gemini-2.0-flash — prior generation.
  • gemini-2.0-pro — prior generation.
  • Embedding models: text-embedding-004, gemini-embedding-001.
Expose via VK model_aliases:
{
  "model_aliases": {
    "gemini-flash": "gemini/gemini-2.5-flash",
    "gemini-pro":   "gemini/gemini-2.5-pro"
  }
}

Supported endpoints

  • POST /v1/chat/completions — OpenAI-shape dispatched to Gemini’s generateContent / streamGenerateContent endpoint.
  • POST /v1/embeddings — for Gemini embedding models.
Gemini has no /v1/messages equivalent; Anthropic-shape clients should use a VK that has Anthropic or Bedrock as primary.

Context caching

Gemini has an implicit context cache for prompts > 32k tokens on most models. The gateway forwards requests untouched; caching behaviour is entirely upstream-managed. For explicit caching, Gemini offers a separate caches.create API that the gateway does not orchestrate in v1 — but clients can call Gemini directly to create a cache and reference it in subsequent gateway calls.

Known quirks

  • Safety filters on by default. Gemini applies safetySettings thresholds that return finish_reason: SAFETY when content crosses the threshold. Bifrost/core surfaces this as provider_error with the safety metadata in the OTel trace.
  • Rate limits. AI Studio’s free tier has aggressive rate limits; paid tier is unlocked via billing-attached Google account. A 429 triggers fallback if configured.
  • Tool/function calling. Gemini has its own function-calling format that bifrost/core translates from OpenAI-shape tools. Tool-call streaming deltas are less mature on Gemini — expect byte-level differences vs OpenAI.
  • Streaming chunks size. Gemini’s streaming emits larger chunks than OpenAI — fewer, bigger SSE frames. Gateway passes these through unchanged.
  • No system role. Gemini uses systemInstruction at the top level of the request. OpenAI-SDK clients that send a system role message get it rewritten by the translator.

AI Studio vs Vertex

Google has two paths to Gemini:
  • AI Studio (this page) — quickest, API-key auth, pay-as-you-go on a consumer-ish billing model.
  • Vertex AI — GCP-native, IAM-governed, data-residency, enterprise SLAs. See Vertex.
Use AI Studio for dev / scoping / SaaS; use Vertex for production enterprise.