docs/src/content/en/reference/voice/azure.mdx
The AzureVoice class in Mastra provides text-to-speech and speech-to-text capabilities using Microsoft Azure Cognitive Services.
This requires Azure Speech Services credentials that can be provided through environment variables or directly in the configuration:
import { AzureVoice } from '@mastra/voice-azure'
// Initialize with configuration
const voice = new AzureVoice({
speechModel: {
apiKey: 'your-azure-speech-api-key', // Or use AZURE_API_KEY env var
region: 'eastus', // Or use AZURE_REGION env var
voiceName: 'en-US-AriaNeural', // Optional: specific voice for TTS
},
listeningModel: {
apiKey: 'your-azure-speech-api-key', // Or use AZURE_API_KEY env var
region: 'eastus', // Or use AZURE_REGION env var
language: 'en-US', // Optional: recognition language for STT
},
speaker: 'en-US-JennyNeural', // Optional: default voice
})
// Convert text to speech
const audioStream = await voice.speak('Hello, how can I help you?', {
speaker: 'en-US-GuyNeural', // Optional: override default voice
})
// Convert speech to text
const text = await voice.listen(audioStream)
<PropertiesTable content={[ { name: 'speechModel', type: 'AzureSpeechConfig', description: 'Configuration for text-to-speech synthesis.', isOptional: true, properties: [ { type: 'AzureSpeechConfig', parameters: [ { name: 'apiKey', type: 'string', description: 'Azure Speech Services API key (NOT Azure OpenAI key). Falls back to AZURE_API_KEY environment variable.', isOptional: true, }, { name: 'region', type: 'string', description: "Azure region (e.g., 'eastus', 'westeurope'). Falls back to AZURE_REGION environment variable.", isOptional: true, }, { name: 'voiceName', type: 'string', description: "Voice ID for speech synthesis (e.g., 'en-US-AriaNeural', 'en-US-JennyNeural'). Only used in speechModel. See voice list below.", isOptional: true, }, { name: 'language', type: 'string', description: "Recognition language code (e.g., 'en-US', 'fr-FR'). Only used in listeningModel.", isOptional: true, }, ], }, ], }, { name: 'listeningModel', type: 'AzureSpeechConfig', description: 'Configuration for speech-to-text recognition.', isOptional: true, properties: [ { type: 'AzureSpeechConfig', parameters: [ { name: 'apiKey', type: 'string', description: 'Azure Speech Services API key (NOT Azure OpenAI key). Falls back to AZURE_API_KEY environment variable.', isOptional: true, }, { name: 'region', type: 'string', description: "Azure region (e.g., 'eastus', 'westeurope'). Falls back to AZURE_REGION environment variable.", isOptional: true, }, { name: 'voiceName', type: 'string', description: "Voice ID for speech synthesis (e.g., 'en-US-AriaNeural', 'en-US-JennyNeural'). Only used in speechModel. See voice list below.", isOptional: true, }, { name: 'language', type: 'string', description: "Recognition language code (e.g., 'en-US', 'fr-FR'). Only used in listeningModel.", isOptional: true, }, ], }, ], }, { name: 'speaker', type: 'string', description: 'Default voice ID for speech synthesis.', isOptional: true, }, ]} />
speak()Converts text to speech using Azure's neural text-to-speech service.
<PropertiesTable content={[ { name: 'input', type: 'string | NodeJS.ReadableStream', description: 'Text or text stream to convert to speech.', isOptional: false, }, { name: 'options', type: 'Options', description: 'Configuration options.', isOptional: true, properties: [ { type: 'Options', parameters: [ { name: 'speaker', type: 'string', description: "Voice ID to use for speech synthesis (e.g., 'en-US-JennyNeural'). Overrides the default voice.", isOptional: true, defaultValue: "Constructor's speaker value", }, ], }, ], }, ]} />
Returns: Promise<NodeJS.ReadableStream> - Audio stream in WAV format
listen()Transcribes audio using Azure's speech-to-text service.
<PropertiesTable content={[ { name: 'audioStream', type: 'NodeJS.ReadableStream', description: 'Audio stream to transcribe. Must be in WAV format.', isOptional: false, }, ]} />
Returns: Promise<string> - The recognized text from the audio
Note: Language and recognition settings are configured in the listeningModel configuration during initialization, not passed as options to this method.
getSpeakers()Returns an array of available voice options (200+ voices), where each node contains:
<PropertiesTable content={[ { name: 'voiceId', type: 'string', description: "Unique identifier for the voice (e.g., 'en-US-JennyNeural', 'fr-FR-DeniseNeural')", isOptional: false, }, { name: 'language', type: 'string', description: "Language code extracted from voice ID (e.g., 'en', 'fr')", isOptional: false, }, { name: 'region', type: 'string', description: "Region code extracted from voice ID (e.g., 'US', 'GB', 'FR')", isOptional: false, }, ]} />
Returns: Promise<Array<{ voiceId: string; language: string; region: string; }>>
⚠️ Critical: This package uses Azure Speech Services, which is different from Azure OpenAI Services.
AZURE_OPENAI_API_KEY for this packageAPI keys and regions can be provided via constructor options or environment variables:
AZURE_API_KEY - Your Azure Speech Services subscription keyAZURE_REGION - Your Azure region (e.g., 'eastus', 'westeurope'){language}-{region}-{name}Neural (e.g., 'en-US-JennyNeural')Azure provides 200+ neural voices across many languages. Some popular English voices include:
US English:
en-US-AriaNeural (Female, default)en-US-JennyNeural (Female)en-US-GuyNeural (Male)en-US-DavisNeural (Male)en-US-AvaNeural (Female)en-US-AndrewNeural (Male)British English:
en-GB-SoniaNeural (Female)en-GB-RyanNeural (Male)en-GB-LibbyNeural (Female)Australian English:
en-AU-NatashaNeural (Female)en-AU-WilliamNeural (Male)To get a complete list of all 200+ voices:
const voices = await voice.getSpeakers()
console.log(voices) // Array of { voiceId, language, region }
For more information, see the Azure Neural TTS documentation.