Back to Mastra

Reference: ElevenLabs | Voice

docs/src/content/en/reference/voice/elevenlabs.mdx

2025-12-185.0 KB
Original Source

ElevenLabs

The ElevenLabs voice implementation in Mastra provides high-quality text-to-speech (TTS) and speech-to-text (STT) capabilities using the ElevenLabs API.

Usage example

typescript
import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'

// Initialize with default configuration (uses ELEVENLABS_API_KEY environment variable)
const voice = new ElevenLabsVoice()

// Initialize with custom configuration
const voice = new ElevenLabsVoice({
  speechModel: {
    name: 'eleven_multilingual_v2',
    apiKey: 'your-api-key',
  },
  speaker: 'custom-speaker-id',
})

// Text-to-Speech
const audioStream = await voice.speak('Hello, world!')

// Get available speakers
const speakers = await voice.getSpeakers()

Constructor parameters

<PropertiesTable content={[ { name: 'speechModel', type: 'ElevenLabsVoiceConfig', description: 'Configuration for text-to-speech functionality.', isOptional: true, defaultValue: "{ name: 'eleven_multilingual_v2' }", properties: [ { type: 'ElevenLabsVoiceConfig', parameters: [ { name: 'name', type: 'ElevenLabsModel', description: 'The ElevenLabs model to use', isOptional: true, defaultValue: "'eleven_multilingual_v2'", }, { name: 'apiKey', type: 'string', description: 'ElevenLabs API key. Falls back to ELEVENLABS_API_KEY environment variable', isOptional: true, }, ], }, ], }, { name: 'speaker', type: 'string', description: 'ID of the speaker to use for text-to-speech', isOptional: true, defaultValue: "'9BWtsMINqrJLrRacOk9x' (Aria voice)", }, ]} />

Methods

speak()

Converts text to speech using the configured speech model and voice.

<PropertiesTable content={[ { name: 'input', type: 'string | NodeJS.ReadableStream', description: 'Text to convert to speech. If a stream is provided, it will be converted to text first.', isOptional: false, }, { name: 'options', type: 'object', description: 'Additional options for speech synthesis', isOptional: true, properties: [ { type: 'object', parameters: [ { name: 'speaker', type: 'string', description: 'Override the default speaker ID for this request', isOptional: true, }, ], }, ], }, ]} />

Returns: Promise<NodeJS.ReadableStream>

getSpeakers()

Returns an array of available voice options, where each node contains:

<PropertiesTable content={[ { name: 'voiceId', type: 'string', description: 'Unique identifier for the voice', isOptional: false, }, { name: 'name', type: 'string', description: 'Display name of the voice', isOptional: false, }, { name: 'language', type: 'string', description: 'Language code for the voice', isOptional: false, }, { name: 'gender', type: 'string', description: 'Gender of the voice', isOptional: false, }, ]} />

listen()

Converts audio input to text using ElevenLabs Speech-to-Text API.

<PropertiesTable content={[ { name: 'input', type: 'NodeJS.ReadableStream', description: 'A readable stream containing the audio data to transcribe', isOptional: false, }, { name: 'options', type: 'object', description: 'Configuration options for the transcription', isOptional: true, }, ]} />

The options object supports the following properties:

<PropertiesTable content={[ { name: 'language_code', type: 'string', description: "ISO language code (e.g., 'en', 'fr', 'es')", isOptional: true, }, { name: 'tag_audio_events', type: 'boolean', description: 'Whether to tag audio events like [MUSIC], [LAUGHTER], etc.', isOptional: true, }, { name: 'num_speakers', type: 'number', description: 'Number of speakers to detect in the audio', isOptional: true, }, { name: 'filetype', type: 'string', description: "Audio file format (e.g., 'mp3', 'wav', 'ogg')", isOptional: true, }, { name: 'timeoutInSeconds', type: 'number', description: 'Request timeout in seconds', isOptional: true, }, { name: 'maxRetries', type: 'number', description: 'Maximum number of retry attempts', isOptional: true, }, { name: 'abortSignal', type: 'AbortSignal', description: 'Signal to abort the request', isOptional: true, }, ]} />

Returns: Promise<string> - A Promise that resolves to the transcribed text

Important notes

  1. An ElevenLabs API key is required. Set it via the ELEVENLABS_API_KEY environment variable or pass it in the constructor.
  2. The default speaker is set to Aria (ID: '9BWtsMINqrJLrRacOk9x').
  3. Speech-to-text functionality isn't supported by ElevenLabs.
  4. Available speakers can be retrieved using the getSpeakers() method, which returns detailed information about each voice including language and gender.