voice/azure/README.md
Azure Voice integration for Mastra, providing both Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities using Azure's Cognitive Services Speech SDK.
npm install @mastra/voice-azure
The module requires Azure Speech Services credentials that can be provided through environment variables or directly in the configuration:
AZURE_API_KEY=your_speech_service_key
AZURE_REGION=your_azure_region
To get these credentials:
import { AzureVoice } from '@mastra/voice-azure';
// Create voice with both speech and listening capabilities
const voice = new AzureVoice({
speechModel: {
apiKey: 'your-api-key', // Optional, can use AZURE_API_KEY env var
region: 'your-region', // Optional, can use AZURE_REGION env var
voiceName: 'en-US-AriaNeural', // Optional, default voice
},
listeningModel: {
apiKey: 'your-api-key', // Optional, can use AZURE_API_KEY env var
region: 'your-region', // Optional, can use AZURE_REGION env var
language: 'en-US', // Optional, recognition language
},
});
// List available voices
const voices = await voice.getSpeakers();
// Generate speech
const audioStream = await voice.speak('Hello from Mastra!', {
speaker: 'en-US-JennyNeural', // Optional: override default voice
});
// Convert speech to text
const text = await voice.listen(audioStream);
Azure provides numerous neural voices across different languages. Here are some popular English voices:
Each voice ID follows the format: {language}-{region}-{name}Neural
For a complete list of supported voices, you can:
getSpeakers() method❌ Don't use generic names:
voiceName: 'neural'; // WRONG - not a valid voice
✅ Use specific voice IDs:
voiceName: 'en-US-AriaNeural'; // CORRECT
❌ Don't use wrong property names:
listeningModel: {
voiceName: 'whisper'; // WRONG - use 'language' property instead
}
✅ Use correct properties:
listeningModel: {
language: 'en-US'; // CORRECT
}
❌ Don't use Azure OpenAI credentials:
apiKey: process.env.AZURE_OPENAI_API_KEY; // WRONG
✅ Use Azure Speech Services credentials:
apiKey: process.env.AZURE_API_KEY; // CORRECT