docs/src/content/en/reference/voice/google.mdx
The Google Voice implementation in Mastra provides both text-to-speech (TTS) and speech-to-text (STT) capabilities using Google Cloud services. It supports multiple voices, languages, advanced audio configuration options, and both standard API key authentication and Vertex AI mode for enterprise deployments.
import { GoogleVoice } from '@mastra/voice-google'
// Initialize with default configuration (uses GOOGLE_API_KEY environment variable)
const voice = new GoogleVoice()
// Text-to-Speech
const audioStream = await voice.speak('Hello, world!', {
languageCode: 'en-US',
audioConfig: {
audioEncoding: 'LINEAR16',
},
})
// Speech-to-Text
const transcript = await voice.listen(audioStream, {
config: {
encoding: 'LINEAR16',
languageCode: 'en-US',
},
})
// Get available voices for a specific language
const voices = await voice.getSpeakers({ languageCode: 'en-US' })
<PropertiesTable content={[ { name: 'speechModel', type: 'GoogleModelConfig', description: 'Configuration for text-to-speech functionality', isOptional: true, defaultValue: '{ apiKey: process.env.GOOGLE_API_KEY }', properties: [ { type: 'GoogleModelConfig', parameters: [ { name: 'apiKey', type: 'string', description: 'Google Cloud API key. Falls back to GOOGLE_API_KEY environment variable. Not used when vertexAI is true.', isOptional: true, }, { name: 'keyFilename', type: 'string', description: 'Path to service account JSON key file. Falls back to GOOGLE_APPLICATION_CREDENTIALS environment variable.', isOptional: true, }, { name: 'credentials', type: 'object', description: 'In-memory service account credentials object with client_email and private_key properties.', isOptional: true, }, ], }, ], }, { name: 'listeningModel', type: 'GoogleModelConfig', description: 'Configuration for speech-to-text functionality', isOptional: true, defaultValue: '{ apiKey: process.env.GOOGLE_API_KEY }', properties: [ { type: 'GoogleModelConfig', parameters: [ { name: 'apiKey', type: 'string', description: 'Google Cloud API key. Falls back to GOOGLE_API_KEY environment variable. Not used when vertexAI is true.', isOptional: true, }, { name: 'keyFilename', type: 'string', description: 'Path to service account JSON key file. Falls back to GOOGLE_APPLICATION_CREDENTIALS environment variable.', isOptional: true, }, { name: 'credentials', type: 'object', description: 'In-memory service account credentials object with client_email and private_key properties.', isOptional: true, }, ], }, ], }, { name: 'speaker', type: 'string', description: 'Default voice ID to use for text-to-speech', isOptional: true, defaultValue: "'en-US-Casual-K'", }, { name: 'vertexAI', type: 'boolean', description: "Enable Vertex AI mode for enterprise deployments. Uses project-based authentication instead of API keys. Requires 'project' to be set.", isOptional: true, defaultValue: 'false', }, { name: 'project', type: 'string', description: 'Google Cloud project ID (required when vertexAI is true). Falls back to GOOGLE_CLOUD_PROJECT environment variable.', isOptional: true, }, { name: 'location', type: 'string', description: 'Google Cloud region for Vertex AI. Falls back to GOOGLE_CLOUD_LOCATION environment variable.', isOptional: true, defaultValue: "'us-central1'", }, ]} />
speak()Converts text to speech using Google Cloud Text-to-Speech service.
<PropertiesTable content={[ { name: 'input', type: 'string | NodeJS.ReadableStream', description: 'Text to convert to speech. If a stream is provided, it will be converted to text first.', isOptional: false, }, { name: 'options', type: 'object', description: 'Speech synthesis options', isOptional: true, properties: [ { type: 'object', parameters: [ { name: 'speaker', type: 'string', description: 'Voice ID to use for this request', isOptional: true, }, { name: 'languageCode', type: 'string', description: "Language code for the voice (e.g., 'en-US'). Defaults to the language code from the speaker ID or 'en-US'", isOptional: true, }, { name: 'audioConfig', type: "ISynthesizeSpeechRequest['audioConfig']", description: 'Audio configuration options from Google Cloud Text-to-Speech API', isOptional: true, defaultValue: "{ audioEncoding: 'LINEAR16' }", }, ], }, ], }, ]} />
Returns: Promise<NodeJS.ReadableStream>
listen()Converts speech to text using Google Cloud Speech-to-Text service.
<PropertiesTable content={[ { name: 'audioStream', type: 'NodeJS.ReadableStream', description: 'Audio stream to transcribe', isOptional: false, }, { name: 'options', type: 'object', description: 'Recognition options', isOptional: true, properties: [ { type: 'object', parameters: [ { name: 'stream', type: 'boolean', description: 'Whether to use streaming recognition', isOptional: true, }, { name: 'config', type: 'IRecognitionConfig', description: 'Recognition configuration from Google Cloud Speech-to-Text API', isOptional: true, defaultValue: "{ encoding: 'LINEAR16', languageCode: 'en-US' }", }, ], }, ], }, ]} />
Returns: Promise<string>
getSpeakers()Returns an array of available voice options, where each node contains:
<PropertiesTable content={[ { name: 'voiceId', type: 'string', description: 'Unique identifier for the voice', isOptional: false, }, { name: 'languageCodes', type: 'string[]', description: 'List of language codes supported by this voice', isOptional: false, }, ]} />
isUsingVertexAI()Checks if Vertex AI mode is enabled.
Returns: boolean - true if using Vertex AI, false otherwise
getProject()Gets the configured Google Cloud project ID.
Returns: string | undefined - The project ID or undefined if not set
getLocation()Gets the configured Google Cloud location/region.
Returns: string - The location (default: 'us-central1')
The Google Voice provider supports two authentication methods:
Uses a Google Cloud API key for authentication. Suitable for development and basic use cases.
// Using environment variable (GOOGLE_API_KEY)
const voice = new GoogleVoice()
// Using explicit API key
const voice = new GoogleVoice({
speechModel: { apiKey: 'your-api-key' },
listeningModel: { apiKey: 'your-api-key' },
speaker: 'en-US-Casual-K',
})
Uses Google Cloud project-based authentication with service accounts. Recommended for production and enterprise deployments.
Benefits:
Configuration Options:
// Using Application Default Credentials (ADC)
// Set GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_CLOUD_PROJECT env vars
const voice = new GoogleVoice({
vertexAI: true,
project: 'your-gcp-project',
location: 'us-central1', // Optional, defaults to 'us-central1'
})
// Using service account key file
const voice = new GoogleVoice({
vertexAI: true,
project: 'your-gcp-project',
speechModel: {
keyFilename: '/path/to/service-account.json',
},
listeningModel: {
keyFilename: '/path/to/service-account.json',
},
})
// Using in-memory credentials
const voice = new GoogleVoice({
vertexAI: true,
project: 'your-gcp-project',
speechModel: {
credentials: {
client_email: '[email protected]',
private_key: '-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----',
},
},
})
For Text-to-Speech:
roles/texttospeech.admin - Text-to-Speech Admin (full access)roles/texttospeech.editor - Text-to-Speech Editor (create and manage)roles/texttospeech.viewer - Text-to-Speech Viewer (read-only)For Speech-to-Text:
roles/speech.client - Speech-to-Text ClientFor synchronous Text-to-Speech synthesis:
https://www.googleapis.com/auth/cloud-platform - Full access to Google Cloud Platform servicesFor long-audio Text-to-Speech operations:
locations.longAudioSynthesize - Create long-audio synthesis operationsoperations.get - Get operation statusoperations.list - List operationsGOOGLE_API_KEY - API key for standard modeGOOGLE_CLOUD_PROJECT - Project ID for Vertex AI modeGOOGLE_CLOUD_LOCATION - Location for Vertex AI mode (defaults to 'us-central1')GOOGLE_APPLICATION_CREDENTIALS - Path to service account key file'en-US-Casual-K'.speak() method supports advanced audio configuration through the Google Cloud Text-to-Speech API.listen() method supports various recognition configurations through the Google Cloud Speech-to-Text API.getSpeakers() method.