docs/models/google.md
The GoogleModel is a model that uses the google-genai package under the hood to
access Google's Gemini models via both the Gemini API and Google Cloud (formerly known as Vertex AI).
Two providers wrap those endpoints:
GoogleProvider][pydantic_ai.providers.google.GoogleProvider] — the Gemini API (Google AI Studio), surfaced under the 'google:' prefix.GoogleCloudProvider][pydantic_ai.providers.google_cloud.GoogleCloudProvider] — Google Cloud (formerly known as Vertex AI), surfaced under the 'google-cloud:' prefix.!!! note "Renamed prefixes (1.x → v2)"
The 'google-gla:' and 'google-vertex:' prefixes still work in 1.x but emit a DeprecationWarning. Use 'google:' and 'google-cloud:' instead. Likewise GoogleProvider(...) with any Google Cloud-only argument (vertexai=True, location, project, or credentials) is deprecated in favor of GoogleCloudProvider(...).
To use GoogleModel, you need to either install pydantic-ai, or install pydantic-ai-slim with the google optional group:
pip/uv-add "pydantic-ai-slim[google]"
GoogleModel lets you use Google's Gemini models through their Gemini API (generativelanguage.googleapis.com) or Google Cloud (*-aiplatform.googleapis.com, formerly known as Vertex AI).
To use Gemini via the Gemini API, go to aistudio.google.com and create an API key.
Once you have the API key, set it as an environment variable:
export GOOGLE_API_KEY=your-api-key
You can then use GoogleModel by name:
from pydantic_ai import Agent
agent = Agent('google:gemini-3-pro-preview')
...
Or you can explicitly create the provider:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider
provider = GoogleProvider(api_key='your-api-key')
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...
If you are an enterprise user, you can also use GoogleModel to access Gemini via Google Cloud (formerly known as Vertex AI).
This interface has a number of advantages over the Gemini API:
You can authenticate using application default credentials, a service account, or an API key.
Whichever way you authenticate, you'll need to have the Vertex AI API (now branded as Google Cloud AI) enabled in your Google Cloud account.
If you have the gcloud CLI installed and configured, you can use the GoogleCloudProvider by name:
from pydantic_ai import Agent
agent = Agent('google-cloud:gemini-3-pro-preview')
...
Or you can explicitly create the provider and model:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider()
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...
To use a service account JSON file, explicitly create the provider and model:
from google.oauth2 import service_account
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
credentials = service_account.Credentials.from_service_account_file(
'path/to/service-account.json',
scopes=['https://www.googleapis.com/auth/cloud-platform'],
)
provider = GoogleCloudProvider(credentials=credentials, project='your-project-id')
model = GoogleModel('gemini-3-flash-preview', provider=provider)
agent = Agent(model)
...
To use Google Cloud with an API key, create a key and set it as an environment variable:
export GOOGLE_API_KEY=your-api-key
You can then use GoogleModel via the GoogleCloudProvider by name:
from pydantic_ai import Agent
agent = Agent('google-cloud:gemini-3-pro-preview')
...
Or you can explicitly create the provider and model:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider(api_key='your-api-key')
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...
You can specify the location and/or project when using Google Cloud:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider(location='asia-east1', project='your-google-cloud-project-id')
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...
service_tier, google_cloud_service_tier)The unified [service_tier][pydantic_ai.settings.ModelSettings.service_tier] field works on both Google subsystems, with [google_cloud_service_tier][pydantic_ai.models.google.GoogleModelSettings.google_cloud_service_tier] available for finer Google Cloud routing control. The provider-specific field wins when both are set.
Gemini API — sent as the request's service_tier field:
service_tier | Sent to Gemini API |
|---|---|
'auto' | (omitted — server default) |
'default' | 'standard' |
'flex' | 'flex' |
'priority' | 'priority' |
Google Cloud — sent as HTTP routing headers; 'flex' and 'priority' always pick the PT-with-spillover variant, so customers with Provisioned Throughput (PT) keep using their reserved capacity first:
service_tier | Google Cloud routing headers | Effective behavior |
|---|---|---|
'auto' / 'default' | (none) | PT first, then standard on-demand spillover |
'flex' | X-Vertex-AI-LLM-Shared-Request-Type: flex | PT first, then Flex PayGo spillover |
'priority' | X-Vertex-AI-LLM-Shared-Request-Type: priority | PT first, then Priority PayGo spillover |
To bypass PT entirely (or use it exclusively, or any of the other Google Cloud-specific routing combinations) set [google_cloud_service_tier][pydantic_ai.models.google.GoogleModelSettings.google_cloud_service_tier] directly — the unified field is intentionally limited to the safe PT-with-spillover variants.
Google Cloud — full set of routing values
The full [google_cloud_service_tier][pydantic_ai.models.google.GoogleModelSettings.google_cloud_service_tier] values map to these HTTP headers:
'pt_only': PT only (X-Vertex-AI-LLM-Request-Type: dedicated).'pt_then_flex': PT when quota allows, then Flex PayGo spillover (X-Vertex-AI-LLM-Shared-Request-Type: flex).'pt_then_priority': PT when quota allows, then Priority PayGo spillover (X-Vertex-AI-LLM-Shared-Request-Type: priority).'on_demand': Standard on-demand only (X-Vertex-AI-LLM-Request-Type: shared).'flex_only': Flex PayGo only (X-Vertex-AI-LLM-Request-Type: shared and X-Vertex-AI-LLM-Shared-Request-Type: flex).'priority_only': Priority PayGo only (X-Vertex-AI-LLM-Request-Type: shared and X-Vertex-AI-LLM-Shared-Request-Type: priority).Example
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider(location='global')
model = GoogleModel('gemini-3-flash-preview', provider=provider)
agent = Agent(model)
result = agent.run_sync(
'Hello!',
model_settings=GoogleModelSettings(google_cloud_service_tier='pt_then_flex'),
)
Swap 'pt_then_flex' for any [GoogleCloudServiceTier][pydantic_ai.models.google.GoogleCloudServiceTier] value — e.g. 'pt_then_priority' for Priority PayGo spillover, or 'flex_only' / 'priority_only' to bypass PT entirely.
The [google_service_tier][pydantic_ai.models.google.GoogleModelSettings.google_service_tier] field is deprecated in favor of these more specific fields.
After the request, inspect [ModelResponse][pydantic_ai.messages.ModelResponse] provider_details.get('traffic_type') (e.g. ON_DEMAND_FLEX, ON_DEMAND_PRIORITY) to see which tier served it, when the API returns it.
You can access models from the Model Garden that support the generateContent API and are available under your Google Cloud project, including but not limited to Gemini, using one of the following model_name patterns:
{model_id} for Gemini models{publisher}/{model_id}publishers/{publisher}/models/{model_id}projects/{project}/locations/{location}/publishers/{publisher}/models/{model_id}from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider(
project='your-google-cloud-project-id',
location='us-central1', # the region where the model is available
)
model = GoogleModel('meta/llama-3.3-70b-instruct-maas', provider=provider)
agent = Agent(model)
...
You can customize the GoogleProvider with a custom httpx.AsyncClient:
from httpx import AsyncClient
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider
custom_http_client = AsyncClient(timeout=30)
model = GoogleModel(
'gemini-3-pro-preview',
provider=GoogleProvider(api_key='your-api-key', http_client=custom_http_client),
)
agent = Agent(model)
...
GoogleModel supports multi-modal input, including documents, images, audio, and video.
YouTube video URLs can be passed directly to Google models:
from pydantic_ai import Agent, VideoUrl
from pydantic_ai.models.google import GoogleModel
agent = Agent(GoogleModel('gemini-3-flash-preview'))
result = agent.run_sync(
[
'What is this video about?',
VideoUrl(url='https://www.youtube.com/watch?v=dQw4w9WgXcQ'),
]
)
print(result.output)
Files can be uploaded via the Files API and passed as URLs:
from pydantic_ai import Agent, DocumentUrl
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider
provider = GoogleProvider()
file = provider.client.files.upload(file='pydantic-ai-logo.png')
assert file.uri is not None
agent = Agent(GoogleModel('gemini-3-flash-preview', provider=provider))
result = agent.run_sync(
[
'What company is this logo from?',
DocumentUrl(url=file.uri, media_type=file.mime_type),
]
)
print(result.output)
See the input documentation for more details and examples.
You can customize model behavior using [GoogleModelSettings][pydantic_ai.models.google.GoogleModelSettings]:
from google.genai.types import HarmBlockThreshold, HarmCategory
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
settings = GoogleModelSettings(
temperature=0.2,
max_tokens=1024,
google_thinking_config={'thinking_level': 'low'},
google_safety_settings=[
{
'category': HarmCategory.HARM_CATEGORY_HATE_SPEECH,
'threshold': HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
}
]
)
model = GoogleModel('gemini-3-pro-preview')
agent = Agent(model, model_settings=settings)
...
Gemini 3 models use thinking_level to control thinking behavior:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
# Set thinking level for Gemini 3 models
model_settings = GoogleModelSettings(google_thinking_config={'thinking_level': 'low'}) # 'low' or 'high'
model = GoogleModel('gemini-3-flash-preview')
agent = Agent(model, model_settings=model_settings)
...
For older models (pre-Gemini 3), you can use thinking_budget instead:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
# Disable thinking on older models by setting budget to 0
model_settings = GoogleModelSettings(google_thinking_config={'thinking_budget': 0})
model = GoogleModel('gemini-2.5-flash') # Older model
agent = Agent(model, model_settings=model_settings)
...
Check out the Gemini API docs for more on thinking.
You can customize the safety settings by setting the google_safety_settings field.
from google.genai.types import HarmBlockThreshold, HarmCategory
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
model_settings = GoogleModelSettings(
google_safety_settings=[
{
'category': HarmCategory.HARM_CATEGORY_HATE_SPEECH,
'threshold': HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
}
]
)
model = GoogleModel('gemini-3-flash-preview')
agent = Agent(model, model_settings=model_settings)
...
See the Gemini API docs for more on safety settings.
You can return logprobs from the model in your response by setting google_logprobs and google_top_logprobs in the [GoogleModelSettings][pydantic_ai.models.google.GoogleModelSettings].
This feature is only supported for non-streaming requests and Google Cloud.
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
model_settings = GoogleModelSettings(
google_logprobs=True, google_top_logprobs=2,
)
model = GoogleModel(
model_name='gemini-2.5-flash',
provider=GoogleCloudProvider(location='europe-west1'),
)
agent = Agent(model, model_settings=model_settings)
result = agent.run_sync('Your prompt here')
# Access logprobs from provider_details
logprobs = result.response.provider_details.get('logprobs')
avg_logprobs = result.response.provider_details.get('avg_logprobs')
See the Google Dev Blog for more information.
!!! warning "Cancellation limitations"
The google-genai SDK exposes streaming responses only as an async iterator, with no separate handle for closing the underlying HTTP transport. Because of a Python language rule on async generators, [cancel()][pydantic_ai.result.StreamedRunResult.cancel] cannot interrupt an in-flight chunk read while another coroutine is iterating the stream. Pydantic AI marks the response with state='interrupted', but upstream generation may continue until the surrounding async with agent.run_stream(...) block exits.
For reliable cancellation, either pass `debounce_by=None` to [`stream_text()`][pydantic_ai.result.StreamedRunResult.stream_text], [`stream_output()`][pydantic_ai.result.StreamedRunResult.stream_output], or [`stream_response()`][pydantic_ai.result.StreamedRunResult.stream_response] and call `cancel()` from the same task that's iterating:
```python {title="cancel_google.py" test="skip"}
from pydantic_ai import Agent
agent = Agent('google:gemini-3-pro-preview')
def should_stop(chunk: str) -> bool:
return len(chunk) > 100
async def main():
async with agent.run_stream('Write a long essay about Python') as result:
async for chunk in result.stream_text(debounce_by=None):
if should_stop(chunk):
await result.cancel()
break
```
Or, if you need to keep debouncing, wrap the stream with [`contextlib.aclosing`](https://docs.python.org/3/library/contextlib.html#contextlib.aclosing) so the iterator is closed before `cancel()` runs:
```python {title="cancel_google_aclosing.py" test="skip"}
from contextlib import aclosing
from pydantic_ai import Agent
agent = Agent('google:gemini-3-pro-preview')
def should_stop(chunk: str) -> bool:
return len(chunk) > 100
async def main():
async with agent.run_stream('Write a long essay about Python') as result:
async with aclosing(result.stream_text()) as stream:
async for chunk in stream:
if should_stop(chunk):
break
await result.cancel()
```
Calling `cancel()` from a different task while iteration is in progress is not currently reliable on this provider.