Google Gemini

Gemini via Google AI Studio (the “plain” Google API, distinct from Vertex) is the fastest way to get Gemini models behind the gateway. API-key auth, no GCP project setup.

Configure the provider credential

Under Settings → Model Providers:

Add provider → Google Gemini.
Paste the AI Studio API key (starts with a long alphanumeric string).
Save.

Supported models

gemini-2.5-flash — fast + cheap. Default recommendation for coding CLIs on a Gemini-first VK.
gemini-2.5-pro — bigger, more capable.
gemini-2.0-flash — prior generation.
gemini-2.0-pro — prior generation.
Embedding models: text-embedding-004, gemini-embedding-001.

Expose via VK model_aliases:

{
  "model_aliases": {
    "gemini-flash": "gemini/gemini-2.5-flash",
    "gemini-pro":   "gemini/gemini-2.5-pro"
  }
}

Supported endpoints

POST /v1/chat/completions — OpenAI-shape dispatched to Gemini’s generateContent / streamGenerateContent endpoint.
POST /v1/embeddings — for Gemini embedding models.

Gemini has no /v1/messages equivalent; Anthropic-shape clients should use a VK that has Anthropic or Bedrock as primary.

Context caching

Gemini has an implicit context cache for prompts > 32k tokens on most models. The gateway forwards requests untouched; caching behaviour is entirely upstream-managed. For explicit caching, Gemini offers a separate caches.create API that the gateway does not orchestrate in v1 — but clients can call Gemini directly to create a cache and reference it in subsequent gateway calls.

Known quirks

Safety filters on by default. Gemini applies safetySettings thresholds that return finish_reason: SAFETY when content crosses the threshold. Bifrost/core surfaces this as provider_error with the safety metadata in the OTel trace.
Rate limits. AI Studio’s free tier has aggressive rate limits; paid tier is unlocked via billing-attached Google account. A 429 triggers fallback if configured.
Tool/function calling. Gemini has its own function-calling format that bifrost/core translates from OpenAI-shape tools. Tool-call streaming deltas are less mature on Gemini — expect byte-level differences vs OpenAI.
Streaming chunks size. Gemini’s streaming emits larger chunks than OpenAI — fewer, bigger SSE frames. Gateway passes these through unchanged.
No system role. Gemini uses systemInstruction at the top level of the request. OpenAI-SDK clients that send a system role message get it rewritten by the translator.

AI Studio vs Vertex

Google has two paths to Gemini:

AI Studio (this page) — quickest, API-key auth, pay-as-you-go on a consumer-ish billing model.
Vertex AI — GCP-native, IAM-governed, data-residency, enterprise SLAs. See Vertex.

Use AI Studio for dev / scoping / SaaS; use Vertex for production enterprise.

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Configure the provider credential

Supported models

Supported endpoints

Context caching

Known quirks

AI Studio vs Vertex

Get Started

SDK Integration

Coding CLI Integrations

Virtual Keys & Budgets

Providers

Features

API Reference

Self-Hosting

Cookbooks

Documentation Index

​Configure the provider credential

​Supported models

​Supported endpoints

​Context caching

​Known quirks

​AI Studio vs Vertex

Configure the provider credential

Supported models

Supported endpoints

Context caching

Known quirks

AI Studio vs Vertex