provider-elevenlabs/tts (ElevenLabs Text-to-Speech)

You can run this example with:

bash

npx promptfoo@latest init --example provider-elevenlabs/tts
cd provider-elevenlabs/tts

Test and compare ElevenLabs TTS models and voice settings.

What this tests

Model comparison: Flash v2.5, Turbo v2.5, Multilingual v2
Streaming vs. non-streaming performance
Voice quality across different text inputs
Cost and latency metrics

Setup

Set your ElevenLabs API key:

bash

export ELEVENLABS_API_KEY=your_api_key_here

Run the example

bash

npx promptfoo@latest eval -c ./promptfooconfig.yaml

Or view in the UI:

bash

npx promptfoo@latest eval -c ./promptfooconfig.yaml
npx promptfoo@latest view

What to look for

Model differences: Flash v2.5 has lowest latency (~200ms), Multilingual v2 best quality
Streaming benefits: First chunk arrives in ~75ms for real-time feel
Cost tracking: ~$0.02 per 1000 characters
Audio metadata: Duration, size, format info in response

Available voices

This example uses Rachel (21m00Tcm4TlvDq8ikWAM). Try other popular voices:

Rachel: Calm, clear female voice (default)
Clyde: Warm, grounded male voice (2EiwWnXFnvU5JabPnv8n)
Drew: Well-rounded male voice (29vD33N1CtxCmqQRPOHJ)
Paul: Casual male voice (5Q0t7uMcjvnagumLfvZi)

Voice settings

Customize the voice output:

yaml

voiceSettings:
  stability: 0.5 # 0 (more variable) to 1 (more stable)
  similarity_boost: 0.75 # 0 (low) to 1 (high)
  style: 0.0 # 0 to 1 (only for v2 models)
  use_speaker_boost: true # Enhance clarity
  speed: 1.0 # 0.25 to 4.0

Output formats

Available formats:

mp3_22050_32 - Smallest size, lower quality
mp3_44100_128 - Balanced (default)
mp3_44100_192 - High quality
pcm_16000 - Raw PCM for processing
pcm_44100 - High quality PCM
ulaw_8000 - Phone quality

provider-elevenlabs/tts (ElevenLabs Text-to-Speech)

provider-elevenlabs/tts (ElevenLabs Text-to-Speech)

What this tests

Setup

Run the example

What to look for

Available voices

Voice settings

Output formats

Learn more