docs/providers/google.md
The Google plugin provides access to Gemini models through Google AI Studio, plus image generation, media understanding (image/audio/video), text-to-speech, and web search via Gemini Grounding.
googleGEMINI_API_KEY or GOOGLE_API_KEYagents.defaults.agentRuntime.id: "google-gemini-cli"
reuses Gemini CLI OAuth while keeping model refs canonical as google/*.Choose your preferred auth method and follow the setup steps.
<Tabs> <Tab title="API key"> **Best for:** standard Gemini API access through Google AI Studio.<Steps>
<Step title="Run onboarding">
```bash
openclaw onboard --auth-choice gemini-api-key
```
Or pass the key directly:
```bash
openclaw onboard --non-interactive \
--mode local \
--auth-choice gemini-api-key \
--gemini-api-key "$GEMINI_API_KEY"
```
</Step>
<Step title="Set a default model">
```json5
{
agents: {
defaults: {
model: { primary: "google/gemini-3.1-pro-preview" },
},
},
}
```
</Step>
<Step title="Verify the model is available">
```bash
openclaw models list --provider google
```
</Step>
</Steps>
<Tip>
The environment variables `GEMINI_API_KEY` and `GOOGLE_API_KEY` are both accepted. Use whichever you already have configured.
</Tip>
<Warning>
The `google-gemini-cli` provider is an unofficial integration. Some users
report account restrictions when using OAuth this way. Use at your own risk.
</Warning>
<Steps>
<Step title="Install the Gemini CLI">
The local `gemini` command must be available on `PATH`.
```bash
# Homebrew
brew install gemini-cli
# or npm
npm install -g @google/gemini-cli
```
OpenClaw supports both Homebrew installs and global npm installs, including
common Windows/npm layouts.
</Step>
<Step title="Log in via OAuth">
```bash
openclaw models auth login --provider google-gemini-cli --set-default
```
</Step>
<Step title="Verify the model is available">
```bash
openclaw models list --provider google
```
</Step>
</Steps>
- Default model: `google/gemini-3.1-pro-preview`
- Runtime: `google-gemini-cli`
- Alias: `gemini-cli`
Gemini 3.1 Pro's Gemini API model id is `gemini-3.1-pro-preview`. OpenClaw accepts the shorter `google/gemini-3.1-pro` as a convenience alias and normalizes it before provider calls.
**Environment variables:**
- `OPENCLAW_GEMINI_OAUTH_CLIENT_ID`
- `OPENCLAW_GEMINI_OAUTH_CLIENT_SECRET`
(Or the `GEMINI_CLI_*` variants.)
<Note>
If Gemini CLI OAuth requests fail after login, set `GOOGLE_CLOUD_PROJECT` or
`GOOGLE_CLOUD_PROJECT_ID` on the gateway host and retry.
</Note>
<Note>
If login fails before the browser flow starts, make sure the local `gemini`
command is installed and on `PATH`.
</Note>
`google-gemini-cli/*` model refs are legacy compatibility aliases. New
configs should use `google/*` model refs plus the `google-gemini-cli`
runtime when they want local Gemini CLI execution.
| Capability | Supported |
|---|---|
| Chat completions | Yes |
| Image generation | Yes |
| Music generation | Yes |
| Text-to-speech | Yes |
| Realtime voice | Yes (Google Live API) |
| Image understanding | Yes |
| Audio transcription | Yes |
| Video understanding | Yes |
| Web search (Grounding) | Yes |
| Thinking/reasoning | Yes (Gemini 2.5+ / Gemini 3+) |
| Gemma 4 models | Yes |
The bundled gemini web-search provider uses Gemini Google Search grounding.
Configure a dedicated search key under plugins.entries.google.config.webSearch,
or let it reuse models.providers.google.apiKey after GEMINI_API_KEY:
{
plugins: {
entries: {
google: {
config: {
webSearch: {
apiKey: "AIza...", // optional if GEMINI_API_KEY or models.providers.google.apiKey is set
baseUrl: "https://generativelanguage.googleapis.com/v1beta", // falls back to models.providers.google.baseUrl
model: "gemini-2.5-flash",
},
},
},
},
},
}
Credential precedence is dedicated webSearch.apiKey, then GEMINI_API_KEY,
then models.providers.google.apiKey. webSearch.baseUrl is optional and
exists for operator proxies or compatible Gemini API endpoints; when omitted,
Gemini web search reuses models.providers.google.baseUrl. See
Gemini search for the provider-specific tool behavior.
/think adaptive keeps Google's dynamic thinking semantics instead of choosing
a fixed OpenClaw level. Gemini 3 and Gemini 3.1 omit a fixed thinkingLevel so
Google can choose the level; Gemini 2.5 sends Google's dynamic sentinel
thinkingBudget: -1.
Gemma 4 models (for example gemma-4-26b-a4b-it) support thinking mode. OpenClaw
rewrites thinkingBudget to a supported Google thinkingLevel for Gemma 4.
Setting thinking to off preserves thinking disabled instead of mapping to
MINIMAL.
</Tip>
The bundled google image-generation provider defaults to
google/gemini-3.1-flash-image-preview.
google/gemini-3-pro-image-previewsize, aspectRatio, and resolutionTo use Google as the default image provider:
{
agents: {
defaults: {
imageGenerationModel: {
primary: "google/gemini-3.1-flash-image-preview",
},
},
},
}
The bundled google plugin also registers video generation through the shared
video_generate tool.
google/veo-3.1-fast-generate-previewaspectRatio, resolution, and audioTo use Google as the default video provider:
{
agents: {
defaults: {
videoGenerationModel: {
primary: "google/veo-3.1-fast-generate-preview",
},
},
},
}
The bundled google plugin also registers music generation through the shared
music_generate tool.
google/lyria-3-clip-previewgoogle/lyria-3-pro-previewlyrics and instrumentalmp3 by default, plus wav on google/lyria-3-pro-previewaction: "status"To use Google as the default music provider:
{
agents: {
defaults: {
musicGenerationModel: {
primary: "google/lyria-3-clip-preview",
},
},
},
}
The bundled google speech provider uses the Gemini API TTS path with
gemini-3.1-flash-tts-preview.
Koremessages.tts.providers.google.apiKey, models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEYffmpegTo use Google as the default TTS provider:
{
messages: {
tts: {
auto: "always",
provider: "google",
providers: {
google: {
model: "gemini-3.1-flash-tts-preview",
voiceName: "Kore",
audioProfile: "Speak professionally with a calm tone.",
},
},
},
},
}
Gemini API TTS uses natural-language prompting for style control. Set
audioProfile to prepend a reusable style prompt before the spoken text. Set
speakerName when your prompt text refers to a named speaker.
Gemini API TTS also accepts expressive square-bracket audio tags in the text,
such as [whispers] or [laughs]. To keep tags out of the visible chat reply
while sending them to TTS, put them inside a [[tts:text]]...[[/tts:text]]
block:
Here is the clean reply text.
[[tts:text]][whispers] Here is the spoken version.[[/tts:text]]
The bundled google plugin registers a realtime voice provider backed by the
Gemini Live API for backend audio bridges such as Voice Call and Google Meet.
| Setting | Config path | Default |
|---|---|---|
| Model | plugins.entries.voice-call.config.realtime.providers.google.model | gemini-2.5-flash-native-audio-preview-12-2025 |
| Voice | ...google.voice | Kore |
| Temperature | ...google.temperature | (unset) |
| VAD start sensitivity | ...google.startSensitivity | (unset) |
| VAD end sensitivity | ...google.endSensitivity | (unset) |
| Silence duration | ...google.silenceDurationMs | (unset) |
| Activity handling | ...google.activityHandling | Google default, start-of-activity-interrupts |
| Turn coverage | ...google.turnCoverage | Google default, only-activity |
| Disable auto VAD | ...google.automaticActivityDetectionDisabled | false |
| Session resumption | ...google.sessionResumption | true |
| Context compression | ...google.contextWindowCompression | true |
| API key | ...google.apiKey | Falls back to models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY |
Example Voice Call realtime config:
{
plugins: {
entries: {
"voice-call": {
enabled: true,
config: {
realtime: {
enabled: true,
provider: "google",
providers: {
google: {
model: "gemini-2.5-flash-native-audio-preview-12-2025",
voice: "Kore",
activityHandling: "start-of-activity-interrupts",
turnCoverage: "only-activity",
},
},
},
},
},
},
},
}
For maintainer live verification, run
OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts.
The Google leg mints the same constrained Live API token shape used by Control
UI Talk, opens the browser WebSocket endpoint, sends the initial setup payload,
and waits for setupComplete.
- Configure per-model or global params with either
`cachedContent` or legacy `cached_content`
- If both are present, `cachedContent` wins
- Example value: `cachedContents/prebuilt-context`
- Gemini cache-hit usage is normalized into OpenClaw `cacheRead` from
upstream `cachedContentTokenCount`
```json5
{
agents: {
defaults: {
models: {
"google/gemini-2.5-pro": {
params: {
cachedContent: "cachedContents/prebuilt-context",
},
},
},
},
},
}
```
- Reply text comes from the CLI JSON `response` field.
- Usage falls back to `stats` when the CLI leaves `usage` empty.
- `stats.cached` is normalized into OpenClaw `cacheRead`.
- If `stats.input` is missing, OpenClaw derives input tokens from
`stats.input_tokens - stats.cached`.