docs/integrations/model-providers/gcp-vertex-ai-gemini.mdx
This guide shows how to set up a minimal deployment to use the TensorZero Gateway with GCP Vertex AI Gemini.
For this minimal setup, you'll need just two files in your project directory:
- config/
- tensorzero.toml
- docker-compose.yml
You can also find the complete code for this example on GitHub.
</Tip>For production deployments, see our Deployment Guide.
Create a minimal configuration file that defines a model and a simple chat function:
[models.gemini_2_5_flash]
routing = ["gcp_vertex_gemini"]
[models.gemini_2_5_flash.providers.gcp_vertex_gemini]
type = "gcp_vertex_gemini"
model_id = "gemini-2.5-flash" # or endpoint_id = "..." for fine-tuned models and custom endpoints
location = "us-central1"
project_id = "your-project-id" # change this
[functions.my_function_name]
type = "chat"
[functions.my_function_name.variants.my_variant_name]
type = "chat_completion"
model = "gemini_2_5_flash"
See the list of models available on GCP Vertex AI Gemini.
Alternatively, you can use the short-hand gcp_vertex_gemini::model_name to use a GCP Vertex AI Gemini model with TensorZero if you don't need advanced features like fallbacks or custom credentials:
gcp_vertex_gemini::projects/<PROJECT_ID>/locations/<REGION>/publishers/google/models/<MODEL_ID>gcp_vertex_gemini::projects/<PROJECT_ID>/locations/<REGION>/endpoints/<ENDPOINT_ID>By default, TensorZero reads the path to your GCP service account JSON file from the GCP_VERTEX_CREDENTIALS_PATH environment variable (using path_from_env::GCP_VERTEX_CREDENTIALS_PATH).
You must generate a GCP service account key in JSON format as described here.
You can customize the credential location using:
sdk: use the Google Cloud SDK to auto-discover credentialspath::/path/to/credentials.json: use a specific file pathpath_from_env::YOUR_ENVIRONMENT_VARIABLE: read file path from an environment variable (default behavior)dynamic::ARGUMENT_NAME: provide credentials dynamically at inference time{ default = ..., fallback = ... }: configure credential fallbacksSee the Credential Management guide and Configuration Reference for more information.
Create a minimal Docker Compose configuration:
# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/deployment/tensorzero-gateway
services:
gateway:
image: tensorzero/gateway
volumes:
- ./config:/app/config:ro
- ${GCP_VERTEX_CREDENTIALS_PATH:-/dev/null}:/app/gcp-credentials.json:ro
command: --config-file /app/config/tensorzero.toml
environment:
GCP_VERTEX_CREDENTIALS_PATH: ${GCP_VERTEX_CREDENTIALS_PATH:+/app/gcp-credentials.json}
ports:
- "3000:3000"
extra_hosts:
- "host.docker.internal:host-gateway"
You can start the gateway with docker compose up.
Make an inference request to the gateway:
curl -X POST http://localhost:3000/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "tensorzero::function_name::my_function_name",
"messages": [
{
"role": "user",
"content": "What is the capital of Japan?"
}
]
}'
GCP Vertex AI Gemini supports batch inference using Google Cloud Storage for input and output files.
To enable batch inference, configure the provider_types.gcp_vertex_gemini.batch section in your configuration file:
[provider_types.gcp_vertex_gemini.batch]
storage_type = "cloud_storage"
input_uri_prefix = "gs://my-bucket/batch-inputs/"
output_uri_prefix = "gs://my-bucket/batch-outputs/"
The service account used by the gateway must have read/write access to the specified GCS buckets.
See the Batch Inference guide and Configuration Reference for more details.
Gemini supports two thinking parameters:
reasoning_effort maps to thinkingConfig.thinkingLevelthinking_budget_tokens maps to thinkingConfig.thinkingBudget (legacy)