Azure

The AzureVoice class in Mastra provides text-to-speech and speech-to-text capabilities using Microsoft Azure Cognitive Services.

Usage example

This requires Azure Speech Services credentials that can be provided through environment variables or directly in the configuration:

typescript

import { AzureVoice } from '@mastra/voice-azure'

// Initialize with configuration
const voice = new AzureVoice({
  speechModel: {
    apiKey: 'your-azure-speech-api-key', // Or use AZURE_API_KEY env var
    region: 'eastus', // Or use AZURE_REGION env var
    voiceName: 'en-US-AriaNeural', // Optional: specific voice for TTS
  },
  listeningModel: {
    apiKey: 'your-azure-speech-api-key', // Or use AZURE_API_KEY env var
    region: 'eastus', // Or use AZURE_REGION env var
    language: 'en-US', // Optional: recognition language for STT
  },
  speaker: 'en-US-JennyNeural', // Optional: default voice
})

// Convert text to speech
const audioStream = await voice.speak('Hello, how can I help you?', {
  speaker: 'en-US-GuyNeural', // Optional: override default voice
})

// Convert speech to text
const text = await voice.listen(audioStream)

Configuration

Constructor options

Methods

`speak()`

Converts text to speech using Azure's neural text-to-speech service.

Returns: Promise<NodeJS.ReadableStream> - Audio stream in WAV format

`listen()`

Transcribes audio using Azure's speech-to-text service.

Returns: Promise<string> - The recognized text from the audio

Note: Language and recognition settings are configured in the listeningModel configuration during initialization, not passed as options to this method.

`getSpeakers()`

Returns an array of available voice options (200+ voices), where each node contains:

Returns: Promise<Array<{ voiceId: string; language: string; region: string; }>>

Important notes

Azure Speech Services vs Azure OpenAI

⚠️ Critical: This package uses Azure Speech Services, which is different from Azure OpenAI Services.

DON'T use your AZURE_OPENAI_API_KEY for this package
DO use an Azure Speech Services subscription key (obtain from Azure Portal under "Speech Services")
These are separate Azure resources with different API keys and endpoints

Environment variables

API keys and regions can be provided via constructor options or environment variables:

AZURE_API_KEY - Your Azure Speech Services subscription key
AZURE_REGION - Your Azure region (e.g., 'eastus', 'westeurope')

Voice Capabilities

Azure offers 200+ neural voices across 50+ languages
Each voice ID follows the format: {language}-{region}-{name}Neural (e.g., 'en-US-JennyNeural')
Some voices include multilingual support or HD quality variants
Audio output is in WAV format
Audio input for recognition must be in WAV format

Available voices

Azure provides 200+ neural voices across many languages. Some popular English voices include:

US English:
- en-US-AriaNeural (Female, default)
- en-US-JennyNeural (Female)
- en-US-GuyNeural (Male)
- en-US-DavisNeural (Male)
- en-US-AvaNeural (Female)
- en-US-AndrewNeural (Male)
British English:
- en-GB-SoniaNeural (Female)
- en-GB-RyanNeural (Male)
- en-GB-LibbyNeural (Female)
Australian English:
- en-AU-NatashaNeural (Female)
- en-AU-WilliamNeural (Male)

To get a complete list of all 200+ voices: