docs/adapters/nlp/vertex.md
The Vertex AI Service Adapter provides integration with Google Cloud's Vertex AI platform, supporting both Anthropic Claude models and Google Gemini models through their respective APIs. This adapter implements the Parlant NLP service interface for text generation, embeddings, and tokenization.
# Required
VERTEX_AI_PROJECT_ID=your-gcp-project-id
VERTEX_AI_REGION=us-central1 # Put your region
VERTEX_AI_MODEL=claude-opus-4
The adapter uses Google Application Default Credentials (ADC):
# For local development
gcloud auth application-default login
# For production, use service account key or workload identity
| Short Name | Full Model Name | Description |
|---|---|---|
claude-opus-4 | claude-opus-4@20250514 | Most capable Claude model |
claude-sonnet-4 | claude-sonnet-4@20250514 | Balanced performance and speed |
claude-sonnet-3.5 | claude-3-5-sonnet-v2@20241022 | Previous generation Sonnet |
claude-haiku-3.5 | claude-3-5-haiku@20241022 | Fastest Claude model |
| Short Name | Full Model Name | Description |
|---|---|---|
gemini-2.5-flash | gemini-2.5-flash | Latest fast Gemini model |
gemini-2.5-pro | gemini-2.5-pro | Latest pro Gemini model |
gemini-2.0-flash | gemini-2.0-flash | Previous generation flash |
gemini-1.5-flash | gemini-1.5-flash | 1M token context |
gemini-1.5-pro | gemini-1.5-pro | 2M token context |
import parlant.sdk import p
from parlant.sdk import NLPServices
async with p.Server(nlp_service=NLPServices.vertex) as server:
agent = await server.create_agent(
name="Healthcare Agent",
description="Is empathetic and calming to the patient.",
)
from parlant.adapters.nlp.vertex_service import VertexAIService
from parlant.core.loggers import Logger
# Initialize service
logger = Logger()
service = VertexAIService(logger=logger)
# Get schematic generator
generator = await service.get_schematic_generator(YourSchemaClass)
# Generate content
result = await generator.generate(
prompt="Your prompt here",
hints={"temperature": 0.7, "max_tokens": 1000}
)
Main service class implementing the NLPService interface.
def __init__(self, logger: Logger) -> None
Initializes the service with environment variables:
VERTEX_AI_PROJECT_ID, VERTEX_AI_REGION, VERTEX_AI_MODELasync def get_schematic_generator(self, t: type[T]) -> SchematicGenerator[T]
Returns appropriate generator based on configured model:
async def get_embedder(self) -> Embedder
Returns VertexTextEmbedding004 embedder instance.
async def get_moderation_service(self) -> ModerationService
Returns NoModeration service (moderation not yet implemented).
Schematic generator for Claude models via Anthropic Vertex API.
temperature: Controls randomness (0.0-1.0)max_tokens: Maximum output tokenstop_p: Nucleus sampling parametertop_k: Top-k sampling parameterid: Returns vertex-ai/{model_name}tokenizer: Returns VertexAIEstimatingTokenizer instancemax_tokens: Returns 200,000 (Claude context limit)async def generate(
self,
prompt: str | PromptBuilder,
hints: Mapping[str, Any] = {},
) -> SchematicGenerationResult[T]
Generates structured content using Claude models with:
Schematic generator for Gemini models via Google Gen AI API.
temperature: Controls randomness (0.0-1.0)thinking_config: Configuration for reasoning modelsid: Returns vertex-ai/{model_name}tokenizer: Returns VertexAIEstimatingTokenizer instancemax_tokens: Returns 1M (Flash) or 2M (Pro) tokensasync def generate(
self,
prompt: str | PromptBuilder,
hints: Mapping[str, Any] = {},
) -> SchematicGenerationResult[T]
Generates structured content using Gemini models with:
Text embedding service using Google's text-embedding-004 model.
id: Returns vertex-ai/text-embedding-004dimensions: Returns 768 (embedding dimensions)max_tokens: Returns 8,192 (input token limit)title: Document title for better embeddingstask_type: Embedding task type (default: "RETRIEVAL_DOCUMENT")async def embed(
self,
texts: list[str],
hints: Mapping[str, Any] = {},
) -> EmbeddingResult
Generates embeddings for input texts with batch processing support.
Token counting service supporting both Claude and Gemini models.
async def estimate_token_count(self, prompt: str) -> int
Estimates token count using:
class VertexAIAuthError(Exception):
"""Raised when there are authentication issues with Vertex AI."""
Common causes and solutions:
gcloud auth application-default loginThe adapter implements comprehensive retry policies:
The adapter provides detailed error messages for common issues:
Vertex AI rate limit exceeded. Possible reasons:
1. Your GCP project may have insufficient quota.
2. The model may not be enabled in Vertex AI Model Garden.
3. You might have exceeded the requests-per-minute limit.
Recommended actions:
- Check your Vertex AI quotas in the GCP Console.
- Ensure the model is enabled in Vertex AI Model Garden.
- Review IAM permissions for the service account.
- Visit: https://console.cloud.google.com/vertex-ai/model-garden
Permission denied accessing Vertex AI. Ensure:
1. ADC is properly configured (run 'gcloud auth application-default login')
2. The service account has 'Vertex AI User' role
3. The {model_name} model is enabled in Vertex AI Model Garden
| Model Type | Context Limit | Recommended Usage |
|---|---|---|
| Claude Models | 200K tokens | Long documents, complex reasoning |
| Gemini Flash | 1M tokens | Large context processing |
| Gemini Pro | 2M tokens | Maximum context requirements |
export VERTEX_AI_PROJECT_ID=your-project-id
export VERTEX_AI_REGION=us-central1
export VERTEX_AI_MODEL=claude-sonnet-3.5
from parlant.adapters.nlp.vertex_service import VertexAIAuthError
try:
service = VertexAIService(logger=logger)
generator = await service.get_schematic_generator(MySchema)
result = await generator.generate(prompt)
except VertexAIAuthError as e:
logger.error(f"Authentication failed: {e}")
# Handle auth setup
except Exception as e:
logger.error(f"Generation failed: {e}")
# Handle other errors
Authentication Failures
gcloud auth application-default print-access-tokenModel Access Denied
Rate Limiting
Check usage from the playground UI by inspecting on the generated message
When migrating from other NLP adapters:
Update Environment Variables
# Remove old variables
unset OPENAI_API_KEY ANTHROPIC_API_KEY
# Set Vertex AI variables
export VERTEX_AI_PROJECT_ID=your-project-id
export VERTEX_AI_REGION=us-central1
export VERTEX_AI_MODEL=claude-opus-4
Model Name Mapping
gpt-4 → claude-opus-4gpt-3.5-turbo → gemini-2.5-flashclaude-3-sonnet → claude-opus-4To use the Vertex AI Service Adapter with Parlant, you need to install the appropriate optional dependencies:
pip install "parlant[vertex]"
This installation includes support for both Claude and Gemini models through the Vertex AI platform.
⚠️ Claude 3.5 Sonnet Models Deprecation: Claude Sonnet 3.5 models (claude-3-5-sonnet-20240620 and claude-3-5-sonnet-20241022) will be retired on October 22, 2025. We recommend migrating to Claude Sonnet 4 (claude-sonnet-4-20250514) for improved performance and capabilities.
Before using the adapter, ensure you have proper authentication configured:
# For local development
gcloud auth application-default login
# Verify authentication
gcloud auth application-default print-access-token
Ensure your service account or user has the following IAM roles:
Vertex AI User - for accessing Vertex AI servicesAI Platform User - for model access (legacy role, may be needed for some models)Licensed under the Apache License, Version 2.0. See the source file header for full license text.
Agam Dubey - [email protected]