Back to Mastra

@mastra/voice-azure

voice/azure/README.md

2025-12-183.1 KB
Original Source

@mastra/voice-azure

Azure Voice integration for Mastra, providing both Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities using Azure's Cognitive Services Speech SDK.

Installation

bash
npm install @mastra/voice-azure

Configuration

Environment Variables

The module requires Azure Speech Services credentials that can be provided through environment variables or directly in the configuration:

bash
AZURE_API_KEY=your_speech_service_key
AZURE_REGION=your_azure_region

To get these credentials:

  1. Go to Azure Portal
  2. Create a "Speech Services" resource (NOT "OpenAI Services")
  3. Navigate to "Keys and Endpoint" section
  4. Copy your subscription key and region

Usage

typescript
import { AzureVoice } from '@mastra/voice-azure';

// Create voice with both speech and listening capabilities
const voice = new AzureVoice({
  speechModel: {
    apiKey: 'your-api-key', // Optional, can use AZURE_API_KEY env var
    region: 'your-region', // Optional, can use AZURE_REGION env var
    voiceName: 'en-US-AriaNeural', // Optional, default voice
  },
  listeningModel: {
    apiKey: 'your-api-key', // Optional, can use AZURE_API_KEY env var
    region: 'your-region', // Optional, can use AZURE_REGION env var
    language: 'en-US', // Optional, recognition language
  },
});

// List available voices
const voices = await voice.getSpeakers();

// Generate speech
const audioStream = await voice.speak('Hello from Mastra!', {
  speaker: 'en-US-JennyNeural', // Optional: override default voice
});

// Convert speech to text
const text = await voice.listen(audioStream);

Features

  • High-quality neural Text-to-Speech synthesis
  • Accurate Speech-to-Text recognition
  • 200+ neural voices across multiple languages
  • SSML support
  • Real-time audio streaming
  • Multiple audio format support

Voice Options

Azure provides numerous neural voices across different languages. Here are some popular English voices:

  • en-US-JennyNeural (Female)
  • en-US-GuyNeural (Male)
  • en-US-AriaNeural (Female)
  • en-US-DavisNeural (Male)
  • en-GB-SoniaNeural (Female)
  • en-GB-RyanNeural (Male)
  • en-AU-NatashaNeural (Female)
  • en-AU-WilliamNeural (Male)

Each voice ID follows the format: {language}-{region}-{name}Neural

For a complete list of supported voices, you can:

  1. Call the getSpeakers() method
  2. View the Azure Neural TTS documentation

Common Mistakes to Avoid

Don't use generic names:

typescript
voiceName: 'neural'; // WRONG - not a valid voice

Use specific voice IDs:

typescript
voiceName: 'en-US-AriaNeural'; // CORRECT

Don't use wrong property names:

typescript
listeningModel: {
  voiceName: 'whisper'; // WRONG - use 'language' property instead
}

Use correct properties:

typescript
listeningModel: {
  language: 'en-US'; // CORRECT
}

Don't use Azure OpenAI credentials:

typescript
apiKey: process.env.AZURE_OPENAI_API_KEY; // WRONG

Use Azure Speech Services credentials:

typescript
apiKey: process.env.AZURE_API_KEY; // CORRECT