docs/src/content/en/reference/voice/elevenlabs.mdx
The ElevenLabs voice implementation in Mastra provides high-quality text-to-speech (TTS) and speech-to-text (STT) capabilities using the ElevenLabs API.
import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'
// Initialize with default configuration (uses ELEVENLABS_API_KEY environment variable)
const voice = new ElevenLabsVoice()
// Initialize with custom configuration
const voice = new ElevenLabsVoice({
speechModel: {
name: 'eleven_multilingual_v2',
apiKey: 'your-api-key',
},
speaker: 'custom-speaker-id',
})
// Text-to-Speech
const audioStream = await voice.speak('Hello, world!')
// Get available speakers
const speakers = await voice.getSpeakers()
<PropertiesTable content={[ { name: 'speechModel', type: 'ElevenLabsVoiceConfig', description: 'Configuration for text-to-speech functionality.', isOptional: true, defaultValue: "{ name: 'eleven_multilingual_v2' }", properties: [ { type: 'ElevenLabsVoiceConfig', parameters: [ { name: 'name', type: 'ElevenLabsModel', description: 'The ElevenLabs model to use', isOptional: true, defaultValue: "'eleven_multilingual_v2'", }, { name: 'apiKey', type: 'string', description: 'ElevenLabs API key. Falls back to ELEVENLABS_API_KEY environment variable', isOptional: true, }, ], }, ], }, { name: 'speaker', type: 'string', description: 'ID of the speaker to use for text-to-speech', isOptional: true, defaultValue: "'9BWtsMINqrJLrRacOk9x' (Aria voice)", }, ]} />
speak()Converts text to speech using the configured speech model and voice.
<PropertiesTable content={[ { name: 'input', type: 'string | NodeJS.ReadableStream', description: 'Text to convert to speech. If a stream is provided, it will be converted to text first.', isOptional: false, }, { name: 'options', type: 'object', description: 'Additional options for speech synthesis', isOptional: true, properties: [ { type: 'object', parameters: [ { name: 'speaker', type: 'string', description: 'Override the default speaker ID for this request', isOptional: true, }, ], }, ], }, ]} />
Returns: Promise<NodeJS.ReadableStream>
getSpeakers()Returns an array of available voice options, where each node contains:
<PropertiesTable content={[ { name: 'voiceId', type: 'string', description: 'Unique identifier for the voice', isOptional: false, }, { name: 'name', type: 'string', description: 'Display name of the voice', isOptional: false, }, { name: 'language', type: 'string', description: 'Language code for the voice', isOptional: false, }, { name: 'gender', type: 'string', description: 'Gender of the voice', isOptional: false, }, ]} />
listen()Converts audio input to text using ElevenLabs Speech-to-Text API.
<PropertiesTable content={[ { name: 'input', type: 'NodeJS.ReadableStream', description: 'A readable stream containing the audio data to transcribe', isOptional: false, }, { name: 'options', type: 'object', description: 'Configuration options for the transcription', isOptional: true, }, ]} />
The options object supports the following properties:
<PropertiesTable content={[ { name: 'language_code', type: 'string', description: "ISO language code (e.g., 'en', 'fr', 'es')", isOptional: true, }, { name: 'tag_audio_events', type: 'boolean', description: 'Whether to tag audio events like [MUSIC], [LAUGHTER], etc.', isOptional: true, }, { name: 'num_speakers', type: 'number', description: 'Number of speakers to detect in the audio', isOptional: true, }, { name: 'filetype', type: 'string', description: "Audio file format (e.g., 'mp3', 'wav', 'ogg')", isOptional: true, }, { name: 'timeoutInSeconds', type: 'number', description: 'Request timeout in seconds', isOptional: true, }, { name: 'maxRetries', type: 'number', description: 'Maximum number of retry attempts', isOptional: true, }, { name: 'abortSignal', type: 'AbortSignal', description: 'Signal to abort the request', isOptional: true, }, ]} />
Returns: Promise<string> - A Promise that resolves to the transcribed text
ELEVENLABS_API_KEY environment variable or pass it in the constructor.getSpeakers() method, which returns detailed information about each voice including language and gender.