docs/en/models/gemini.mdx
Google Gemini supports text chat, image understanding, and image generation (Nano Banana series). A single gemini_api_key enables all capabilities.
{
"model": "gemini-3.5-flash",
"gemini_api_key": "YOUR_API_KEY"
}
| Parameter | Description |
|---|---|
model | Recommended: gemini-3.5-flash; also supports gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview, gemini-3-flash-preview, gemini-3-pro-preview, etc. See official docs |
gemini_api_key | Create one in Google AI Studio |
gemini_api_base | Optional, defaults to https://generativelanguage.googleapis.com. Can be changed to a third-party proxy |
All Gemini models natively support vision. Once gemini_api_key is configured, the Agent's Vision tool automatically uses the main model to recognize images, with no extra setup required.
To manually specify a Vision model:
{
"tools": {
"vision": {
"model": "gemini-3.1-flash-lite-preview"
}
}
}
{
"skills": {
"image-generation": {
"model": "gemini-3.1-flash-image-preview"
}
}
}
| Model ID | Alias |
|---|---|
gemini-3.1-flash-image-preview | Nano Banana 2 |
gemini-3-pro-image-preview | Nano Banana Pro |
gemini-2.5-flash-image | Nano Banana |