voice/google/README.md
Google Cloud Voice integration for Mastra, providing both Text-to-Speech (TTS) and Speech-to-Text capabilities.
Note: This package replaces the deprecated @mastra/speech-google package, combining both speech synthesis and recognition capabilities.
npm install @mastra/voice-google
The module supports multiple authentication methods:
Use an API key from Google Cloud Console:
GOOGLE_API_KEY=your_api_key
Use a service account key file:
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
Use OAuth authentication with Google Cloud Platform for enterprise deployments:
# Set project ID
GOOGLE_CLOUD_PROJECT=your_project_id
# Optional: Set location (defaults to us-central1)
GOOGLE_CLOUD_LOCATION=us-central1
# Authenticate via gcloud CLI
gcloud auth application-default login
Or use a service account:
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
GOOGLE_CLOUD_PROJECT=your_project_id
import { GoogleVoice } from '@mastra/voice-google';
// Initialize with configuration
const voice = new GoogleVoice({
speechModel: {
apiKey: 'your-api-key', // Optional, can rely on GOOGLE_API_KEY or ADC
keyFilename: '/path/to/service-account.json', // Optional, can rely on GOOGLE_APPLICATION_CREDENTIALS
},
listeningModel: {
keyFilename: '/path/to/service-account.json', // Optional, can rely on ADC
},
speaker: 'en-US-Standard-F', // Default voice
});
// List available voices
const voices = await voice.getSpeakers();
// Generate speech
const audioStream = await voice.speak('Hello from Mastra!', {
speaker: 'en-US-Standard-F',
languageCode: 'en-US',
});
// Transcribe speech
const text = await voice.listen(audioStream);
For enterprise deployments, use Vertex AI mode which provides better integration with Google Cloud infrastructure:
import { GoogleVoice } from '@mastra/voice-google';
// Initialize with Vertex AI
const voice = new GoogleVoice({
vertexAI: true,
project: 'your-gcp-project',
location: 'us-central1', // Optional, defaults to 'us-central1'
speaker: 'en-US-Studio-O',
});
// Works the same as standard mode
const audioStream = await voice.speak('Hello from Vertex AI!');
const text = await voice.listen(audioStream);
// Check if using Vertex AI
console.log(voice.isUsingVertexAI()); // true
console.log(voice.getProject()); // 'your-gcp-project'
console.log(voice.getLocation()); // 'us-central1'
import { GoogleVoice } from '@mastra/voice-google';
const voice = new GoogleVoice({
vertexAI: true,
project: 'your-gcp-project',
location: 'us-central1',
speechModel: {
keyFilename: '/path/to/service-account.json',
},
listeningModel: {
keyFilename: '/path/to/service-account.json',
},
});
import { GoogleVoice } from '@mastra/voice-google';
const voice = new GoogleVoice({
vertexAI: true,
project: 'your-gcp-project',
speechModel: {
credentials: {
client_email: '[email protected]',
private_key: '-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----',
},
},
});
| Option | Type | Description |
|---|---|---|
speechModel | GoogleModelConfig | Configuration for TTS |
listeningModel | GoogleModelConfig | Configuration for STT |
speaker | string | Default voice ID (default: 'en-US-Casual-K') |
vertexAI | boolean | Enable Vertex AI mode (default: false) |
project | string | Google Cloud project ID (required for Vertex AI) |
location | string | Google Cloud region (default: 'us-central1') |
| Option | Type | Description |
|---|---|---|
apiKey | string | Google Cloud API key |
keyFilename | string | Path to service account JSON key file |
credentials | object | In-memory service account credentials |
speak(input, options?)Converts text to speech.
input: string | NodeJS.ReadableStream - Text to convertoptions.speaker: Override default voiceoptions.languageCode: Language code (e.g., 'en-US')options.audioConfig: Audio encoding optionsReturns: Promise<NodeJS.ReadableStream> - Audio stream
listen(audioStream, options?)Converts speech to text.
audioStream: NodeJS.ReadableStream - Audio to transcribeoptions.config: Recognition configurationReturns: Promise<string> - Transcribed text
getSpeakers(options?)Lists available voices.
options.languageCode: Filter by language (default: 'en-US')Returns: Promise<Array<{ voiceId: string, languageCodes: string[] }>>
isUsingVertexAI()Returns true if Vertex AI mode is enabled.
getProject()Returns the configured Google Cloud project ID.
getLocation()Returns the configured Google Cloud location/region.
When using Vertex AI, ensure your service account or user has the appropriate IAM roles and OAuth scopes:
For Text-to-Speech:
roles/texttospeech.admin - Text-to-Speech Admin (full access)roles/texttospeech.editor - Text-to-Speech Editor (create and manage)roles/texttospeech.viewer - Text-to-Speech Viewer (read-only)For Speech-to-Text:
roles/speech.client - Speech-to-Text ClientFor synchronous Text-to-Speech synthesis:
https://www.googleapis.com/auth/cloud-platform - Full access to Google Cloud Platform servicesFor long-audio Text-to-Speech operations:
locations.longAudioSynthesize - Create long-audio synthesis operationsoperations.get - Get operation statusoperations.list - List operationsView the complete list using the getSpeakers() method or Google Cloud's documentation.