Google

The GoogleModel is a model that uses the google-genai package under the hood to access Google's Gemini models via both the Gemini API and Google Cloud (formerly known as Vertex AI).

Two providers wrap those endpoints:

[GoogleProvider][pydantic_ai.providers.google.GoogleProvider] — the Gemini API (Google AI Studio), surfaced under the 'google:' prefix.
[GoogleCloudProvider][pydantic_ai.providers.google_cloud.GoogleCloudProvider] — Google Cloud (formerly known as Vertex AI), surfaced under the 'google-cloud:' prefix.

!!! note "Renamed prefixes (1.x → v2)" The 'google-gla:' and 'google-vertex:' prefixes still work in 1.x but emit a DeprecationWarning. Use 'google:' and 'google-cloud:' instead. Likewise GoogleProvider(...) with any Google Cloud-only argument (vertexai=True, location, project, or credentials) is deprecated in favor of GoogleCloudProvider(...).

Install

To use GoogleModel, you need to either install pydantic-ai, or install pydantic-ai-slim with the google optional group:

bash

pip/uv-add "pydantic-ai-slim[google]"

Configuration

GoogleModel lets you use Google's Gemini models through their Gemini API (generativelanguage.googleapis.com) or Google Cloud (*-aiplatform.googleapis.com, formerly known as Vertex AI).

API Key (Gemini API)

To use Gemini via the Gemini API, go to aistudio.google.com and create an API key.

Once you have the API key, set it as an environment variable:

bash

export GOOGLE_API_KEY=your-api-key

You can then use GoogleModel by name:

python

from pydantic_ai import Agent

agent = Agent('google:gemini-3-pro-preview')
...

Or you can explicitly create the provider:

python

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider

provider = GoogleProvider(api_key='your-api-key')
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...

Google Cloud (Enterprise)

If you are an enterprise user, you can also use GoogleModel to access Gemini via Google Cloud (formerly known as Vertex AI).

This interface has a number of advantages over the Gemini API:

The Google Cloud API comes with more enterprise readiness guarantees.
You can purchase provisioned throughput with Google Cloud to guarantee capacity.
If you're running Pydantic AI inside Google Cloud, you don't need to set up authentication, it should "just work".
You can decide which region to use, which might be important from a regulatory perspective, and might improve latency.

You can authenticate using application default credentials, a service account, or an API key.

Whichever way you authenticate, you'll need to have the Vertex AI API (now branded as Google Cloud AI) enabled in your Google Cloud account.

Application Default Credentials

If you have the gcloud CLI installed and configured, you can use the GoogleCloudProvider by name:

python

from pydantic_ai import Agent

agent = Agent('google-cloud:gemini-3-pro-preview')
...

Or you can explicitly create the provider and model:

python

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider

provider = GoogleCloudProvider()
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...

Service Account

To use a service account JSON file, explicitly create the provider and model:

python

from google.oauth2 import service_account

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider

credentials = service_account.Credentials.from_service_account_file(
    'path/to/service-account.json',
    scopes=['https://www.googleapis.com/auth/cloud-platform'],
)
provider = GoogleCloudProvider(credentials=credentials, project='your-project-id')
model = GoogleModel('gemini-3-flash-preview', provider=provider)
agent = Agent(model)
...

API Key

To use Google Cloud with an API key, create a key and set it as an environment variable:

bash

export GOOGLE_API_KEY=your-api-key

You can then use GoogleModel via the GoogleCloudProvider by name:

python

from pydantic_ai import Agent

agent = Agent('google-cloud:gemini-3-pro-preview')
...

Or you can explicitly create the provider and model:

python

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider

provider = GoogleCloudProvider(api_key='your-api-key')
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...

Customizing Location or Project

You can specify the location and/or project when using Google Cloud:

python

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider

provider = GoogleCloudProvider(location='asia-east1', project='your-google-cloud-project-id')
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...

Service tier (`service_tier`, `google_cloud_service_tier`)

The unified [service_tier][pydantic_ai.settings.ModelSettings.service_tier] field works on both Google subsystems, with [google_cloud_service_tier][pydantic_ai.models.google.GoogleModelSettings.google_cloud_service_tier] available for finer Google Cloud routing control. The provider-specific field wins when both are set.

Gemini API — sent as the request's service_tier field:

`service_tier`	Sent to Gemini API
`'auto'`	(omitted — server default)
`'default'`	`'standard'`
`'flex'`	`'flex'`
`'priority'`	`'priority'`

Google Cloud — sent as HTTP routing headers; 'flex' and 'priority' always pick the PT-with-spillover variant, so customers with Provisioned Throughput (PT) keep using their reserved capacity first:

`service_tier`	Google Cloud routing headers	Effective behavior
`'auto'` / `'default'`	(none)	PT first, then standard on-demand spillover
`'flex'`	`X-Vertex-AI-LLM-Shared-Request-Type: flex`	PT first, then Flex PayGo spillover
`'priority'`	`X-Vertex-AI-LLM-Shared-Request-Type: priority`	PT first, then Priority PayGo spillover

To bypass PT entirely (or use it exclusively, or any of the other Google Cloud-specific routing combinations) set [google_cloud_service_tier][pydantic_ai.models.google.GoogleModelSettings.google_cloud_service_tier] directly — the unified field is intentionally limited to the safe PT-with-spillover variants.

Google Cloud — full set of routing values

The full [google_cloud_service_tier][pydantic_ai.models.google.GoogleModelSettings.google_cloud_service_tier] values map to these HTTP headers:

'pt_only': PT only (X-Vertex-AI-LLM-Request-Type: dedicated).
'pt_then_flex': PT when quota allows, then Flex PayGo spillover (X-Vertex-AI-LLM-Shared-Request-Type: flex).
'pt_then_priority': PT when quota allows, then Priority PayGo spillover (X-Vertex-AI-LLM-Shared-Request-Type: priority).
'on_demand': Standard on-demand only (X-Vertex-AI-LLM-Request-Type: shared).
'flex_only': Flex PayGo only (X-Vertex-AI-LLM-Request-Type: shared and X-Vertex-AI-LLM-Shared-Request-Type: flex).
'priority_only': Priority PayGo only (X-Vertex-AI-LLM-Request-Type: shared and X-Vertex-AI-LLM-Shared-Request-Type: priority).

Example

python

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
from pydantic_ai.providers.google_cloud import GoogleCloudProvider

provider = GoogleCloudProvider(location='global')
model = GoogleModel('gemini-3-flash-preview', provider=provider)
agent = Agent(model)

result = agent.run_sync(
    'Hello!',
    model_settings=GoogleModelSettings(google_cloud_service_tier='pt_then_flex'),
)

Swap 'pt_then_flex' for any [GoogleCloudServiceTier][pydantic_ai.models.google.GoogleCloudServiceTier] value — e.g. 'pt_then_priority' for Priority PayGo spillover, or 'flex_only' / 'priority_only' to bypass PT entirely.

The [google_service_tier][pydantic_ai.models.google.GoogleModelSettings.google_service_tier] field is deprecated in favor of these more specific fields.

After the request, inspect [ModelResponse][pydantic_ai.messages.ModelResponse] provider_details.get('traffic_type') (e.g. ON_DEMAND_FLEX, ON_DEMAND_PRIORITY) to see which tier served it, when the API returns it.

Model Garden

You can access models from the Model Garden that support the generateContent API and are available under your Google Cloud project, including but not limited to Gemini, using one of the following model_name patterns:

{model_id} for Gemini models
{publisher}/{model_id}
publishers/{publisher}/models/{model_id}
projects/{project}/locations/{location}/publishers/{publisher}/models/{model_id}

python

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider

provider = GoogleCloudProvider(
    project='your-google-cloud-project-id',
    location='us-central1',  # the region where the model is available
)
model = GoogleModel('meta/llama-3.3-70b-instruct-maas', provider=provider)
agent = Agent(model)
...

Custom HTTP Client

You can customize the GoogleProvider with a custom httpx.AsyncClient:

python

from httpx import AsyncClient

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider

custom_http_client = AsyncClient(timeout=30)
model = GoogleModel(
    'gemini-3-pro-preview',
    provider=GoogleProvider(api_key='your-api-key', http_client=custom_http_client),
)
agent = Agent(model)
...

Document, Image, Audio, and Video Input

GoogleModel supports multi-modal input, including documents, images, audio, and video.

YouTube video URLs can be passed directly to Google models:

from pydantic_ai import Agent, VideoUrl
from pydantic_ai.models.google import GoogleModel

agent = Agent(GoogleModel('gemini-3-flash-preview'))
result = agent.run_sync(
    [
        'What is this video about?',
        VideoUrl(url='https://www.youtube.com/watch?v=dQw4w9WgXcQ'),
    ]
)
print(result.output)

Files can be uploaded via the Files API and passed as URLs:

from pydantic_ai import Agent, DocumentUrl
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider

provider = GoogleProvider()
file = provider.client.files.upload(file='pydantic-ai-logo.png')
assert file.uri is not None

agent = Agent(GoogleModel('gemini-3-flash-preview', provider=provider))
result = agent.run_sync(
    [
        'What company is this logo from?',
        DocumentUrl(url=file.uri, media_type=file.mime_type),
    ]
)
print(result.output)

See the input documentation for more details and examples.

Model settings

You can customize model behavior using [GoogleModelSettings][pydantic_ai.models.google.GoogleModelSettings]:

python

from google.genai.types import HarmBlockThreshold, HarmCategory

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings

settings = GoogleModelSettings(
    temperature=0.2,
    max_tokens=1024,
    google_thinking_config={'thinking_level': 'low'},
    google_safety_settings=[
        {
            'category': HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            'threshold': HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
        }
    ]
)
model = GoogleModel('gemini-3-pro-preview')
agent = Agent(model, model_settings=settings)
...

Configure thinking

Gemini 3 models use thinking_level to control thinking behavior:

python

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings

# Set thinking level for Gemini 3 models
model_settings = GoogleModelSettings(google_thinking_config={'thinking_level': 'low'})  # 'low' or 'high'
model = GoogleModel('gemini-3-flash-preview')
agent = Agent(model, model_settings=model_settings)
...

For older models (pre-Gemini 3), you can use thinking_budget instead:

python

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings

# Disable thinking on older models by setting budget to 0
model_settings = GoogleModelSettings(google_thinking_config={'thinking_budget': 0})
model = GoogleModel('gemini-2.5-flash')  # Older model
agent = Agent(model, model_settings=model_settings)
...

Check out the Gemini API docs for more on thinking.

Safety settings

You can customize the safety settings by setting the google_safety_settings field.

python

from google.genai.types import HarmBlockThreshold, HarmCategory

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings

model_settings = GoogleModelSettings(
    google_safety_settings=[
        {
            'category': HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            'threshold': HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
        }
    ]
)
model = GoogleModel('gemini-3-flash-preview')
agent = Agent(model, model_settings=model_settings)
...

See the Gemini API docs for more on safety settings.

Logprobs

You can return logprobs from the model in your response by setting google_logprobs and google_top_logprobs in the [GoogleModelSettings][pydantic_ai.models.google.GoogleModelSettings].

This feature is only supported for non-streaming requests and Google Cloud.

python

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
from pydantic_ai.providers.google_cloud import GoogleCloudProvider

model_settings = GoogleModelSettings(
    google_logprobs=True, google_top_logprobs=2,
)

model = GoogleModel(
    model_name='gemini-2.5-flash',
    provider=GoogleCloudProvider(location='europe-west1'),
)
agent = Agent(model, model_settings=model_settings)

result = agent.run_sync('Your prompt here')
# Access logprobs from provider_details
logprobs = result.response.provider_details.get('logprobs')
avg_logprobs = result.response.provider_details.get('avg_logprobs')

See the Google Dev Blog for more information.

Streaming cancellation

!!! warning "Cancellation limitations" The google-genai SDK exposes streaming responses only as an async iterator, with no separate handle for closing the underlying HTTP transport. Because of a Python language rule on async generators, [cancel()][pydantic_ai.result.StreamedRunResult.cancel] cannot interrupt an in-flight chunk read while another coroutine is iterating the stream. Pydantic AI marks the response with state='interrupted', but upstream generation may continue until the surrounding async with agent.run_stream(...) block exits.

For reliable cancellation, either pass `debounce_by=None` to [`stream_text()`][pydantic_ai.result.StreamedRunResult.stream_text], [`stream_output()`][pydantic_ai.result.StreamedRunResult.stream_output], or [`stream_response()`][pydantic_ai.result.StreamedRunResult.stream_response] and call `cancel()` from the same task that's iterating:

```python {title="cancel_google.py" test="skip"}
from pydantic_ai import Agent

agent = Agent('google:gemini-3-pro-preview')


def should_stop(chunk: str) -> bool:
    return len(chunk) > 100


async def main():
    async with agent.run_stream('Write a long essay about Python') as result:
        async for chunk in result.stream_text(debounce_by=None):
            if should_stop(chunk):
                await result.cancel()
                break
```

Or, if you need to keep debouncing, wrap the stream with [`contextlib.aclosing`](https://docs.python.org/3/library/contextlib.html#contextlib.aclosing) so the iterator is closed before `cancel()` runs:

```python {title="cancel_google_aclosing.py" test="skip"}
from contextlib import aclosing

from pydantic_ai import Agent

agent = Agent('google:gemini-3-pro-preview')


def should_stop(chunk: str) -> bool:
    return len(chunk) > 100


async def main():
    async with agent.run_stream('Write a long essay about Python') as result:
        async with aclosing(result.stream_text()) as stream:
            async for chunk in stream:
                if should_stop(chunk):
                    break
        await result.cancel()
```

Calling `cancel()` from a different task while iteration is in progress is not currently reliable on this provider.

Google

Google

Install

Configuration

API Key (Gemini API)

Google Cloud (Enterprise)

Application Default Credentials

Service Account

API Key

Customizing Location or Project

Service tier (service_tier, google_cloud_service_tier)

Model Garden

Custom HTTP Client

Document, Image, Audio, and Video Input

Model settings

Configure thinking

Safety settings

Logprobs

Streaming cancellation

Service tier (`service_tier`, `google_cloud_service_tier`)