Back to Promptfoo

ElevenLabs

site/docs/providers/elevenlabs.md

0.121.921.2 KB
Original Source

ElevenLabs

The ElevenLabs provider integrates multiple AI audio capabilities for comprehensive voice AI testing and evaluation.

:::tip

For a comprehensive step-by-step tutorial, see the Evaluating ElevenLabs voice AI guide.

:::

Quick Start

Get started with ElevenLabs in 3 steps:

  1. Install and authenticate:

    sh
    npm install -g promptfoo
    export ELEVENLABS_API_KEY=your_api_key_here
    
  2. Create a config file (promptfooconfig.yaml):

    yaml
    prompts:
      - 'Welcome to our customer service. How can I help you today?'
    
    providers:
      - id: elevenlabs:tts:rachel
    
    tests:
      - description: Generate welcome message
        assert:
          - type: cost
            threshold: 0.01
          - type: latency
            threshold: 2000
    
  3. Run your first eval:

    sh
    promptfoo eval
    

    View results with promptfoo view or in the web UI.

Setup

Set your ElevenLabs API key as an environment variable:

sh
export ELEVENLABS_API_KEY=your_api_key_here

Alternatively, specify the API key directly in your configuration:

yaml
providers:
  - id: elevenlabs:tts
    config:
      apiKey: your_api_key_here

:::tip Get your API key from ElevenLabs Settings. Free tier includes 10,000 characters/month. :::

Capabilities

The ElevenLabs provider supports multiple capabilities:

Text-to-Speech (TTS)

Generate high-quality voice synthesis with multiple models and voices:

  • elevenlabs:tts:<voice_name> - TTS with specified voice (e.g., elevenlabs:tts:rachel)
  • elevenlabs:tts - TTS with default voice

Models available:

  • eleven_flash_v2_5 - Fastest, lowest latency (~200ms)
  • eleven_turbo_v2_5 - High quality, fast
  • eleven_multilingual_v2 - Best for non-English languages
  • eleven_monolingual_v1 - English only, high quality

Example:

yaml
providers:
  - id: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5
      voiceSettings:
        stability: 0.5
        similarity_boost: 0.75
        speed: 1.0

Speech-to-Text (STT)

Transcribe audio with speaker diarization and accuracy metrics:

  • elevenlabs:stt - Speech-to-text transcription

Features:

  • Speaker diarization (identify multiple speakers)
  • Word Error Rate (WER) calculation
  • Multiple language support

Example:

yaml
providers:
  - id: elevenlabs:stt
    config:
      modelId: scribe_v1
      diarization: true
      maxSpeakers: 3

Conversational Agents

Test voice AI agents with LLM backends and evaluation criteria:

  • elevenlabs:agents - Voice AI agent testing

Features:

  • Multi-turn conversation simulation
  • Automated evaluation criteria
  • Tool calling and mocking
  • LLM cascading for cost optimization
  • Custom LLM endpoints
  • Multi-voice conversations
  • Phone integration (Twilio, SIP)

Example:

yaml
providers:
  - id: elevenlabs:agents
    config:
      agentConfig:
        name: Customer Support Agent
        prompt: You are a helpful support agent
        voiceId: 21m00Tcm4TlvDq8ikWAM
        llmModel: gpt-4o
      evaluationCriteria:
        - name: helpfulness
          description: Agent provides helpful responses
          weight: 1.0
          passingThreshold: 0.8

Supporting APIs

Additional audio processing capabilities:

  • elevenlabs:history - Retrieve agent conversation history
  • elevenlabs:isolation - Remove background noise from audio
  • elevenlabs:alignment - Generate time-aligned subtitles

Configuration Parameters

All providers support these common parameters:

ParameterDescription
apiKeyYour ElevenLabs API key
apiKeyEnvarEnvironment variable containing the API key
baseUrlCustom base URL for API (default: ElevenLabs API)
timeoutRequest timeout in milliseconds
cacheEnable response caching
cacheTTLCache time-to-live in seconds
enableLoggingEnable debug logging
retriesNumber of retry attempts for failed requests

TTS-Specific Parameters

ParameterDescription
modelIdTTS model (e.g., eleven_flash_v2_5)
voiceIdVoice ID or name (e.g., 21m00Tcm4TlvDq8ikWAM or rachel)
voiceSettingsVoice customization (stability, similarity, style, speed)
outputFormatAudio format (e.g., mp3_44100_128, pcm_44100)
seedSeed for deterministic output
streamingEnable WebSocket streaming for low latency
pronunciationDictionaryCustom pronunciation rules
voiceDesignGenerate voice from text description
voiceRemixModify voice characteristics (gender, accent, age)

STT-Specific Parameters

ParameterDescription
modelIdSTT model (default: scribe_v1)
languageISO 639-1 language code (e.g., en, es)
diarizationEnable speaker diarization
maxSpeakersExpected number of speakers (hint)
audioFormatInput audio format

Agent-Specific Parameters

ParameterDescription
agentIdUse existing agent ID
agentConfigEphemeral agent configuration
simulatedUserAutomated user simulation settings
evaluationCriteriaEvaluation criteria for agent performance
toolMockConfigMock tool responses for testing
maxTurnsMaximum conversation turns (default: 10)
llmCascadeLLM fallback configuration
customLLMCustom LLM endpoint configuration
mcpConfigModel Context Protocol integration
multiVoiceMulti-voice conversation configuration
postCallWebhookWebhook notification after conversation
phoneConfigTwilio or SIP phone integration

Examples

Text-to-Speech: Voice Comparison

yaml
prompts:
  - 'Welcome to ElevenLabs. Our AI voice technology delivers natural-sounding speech.'

providers:
  - id: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5

  - id: elevenlabs:tts:clyde
    config:
      modelId: eleven_turbo_v2_5

tests:
  - description: Audio generation succeeds
    assert:
      - type: cost
        threshold: 0.01
      - type: latency
        threshold: 5000

Speech-to-Text: Accuracy Testing

yaml
prompts:
  - file://audio/test-recording.mp3

providers:
  - id: elevenlabs:stt
    config:
      diarization: true

tests:
  - description: WER is acceptable
    assert:
      - type: javascript
        value: |
          const result = JSON.parse(output);
          return result.wer < 0.05; // Less than 5% error

Conversational Agents: Evaluation

yaml
prompts:
  - |
    User: I need help with my order
    Agent: I'd be happy to help! What's your order number?
    User: ORDER-12345

providers:
  - id: elevenlabs:agents
    config:
      agentConfig:
        prompt: You are a helpful customer support agent
        llmModel: gpt-4o
      evaluationCriteria:
        - name: greeting
          weight: 0.8
          passingThreshold: 0.8
        - name: understanding
          weight: 1.0
          passingThreshold: 0.9

tests:
  - description: Agent meets evaluation criteria
    assert:
      - type: javascript
        value: |
          const result = JSON.parse(output);
          const passed = result.analysis.evaluation_criteria_results.filter(r => r.passed);
          return passed.length >= 2;

Audio Processing: Pipeline

yaml
# 1. Remove noise from audio
providers:
  - id: elevenlabs:isolation

# 2. Transcribe cleaned audio
providers:
  - id: elevenlabs:stt

# 3. Generate subtitles
providers:
  - id: elevenlabs:alignment

Advanced Features

Pronunciation Dictionaries

Customize pronunciation for technical terms:

yaml
providers:
  - id: elevenlabs:tts:rachel
    config:
      pronunciationDictionary:
        - word: 'API'
          pronunciation: 'A P I'
        - word: 'OAuth'
          phoneme: 'əʊɔːθ'

Voice Design

Generate custom voices from descriptions:

yaml
providers:
  - id: elevenlabs:tts
    config:
      voiceDesign:
        name: Custom Voice
        description: A middle-aged American male with a deep, authoritative tone
        gender: male
        age: middle_aged
        accent: american

LLM Cascading

Optimize costs with automatic fallback:

yaml
providers:
  - id: elevenlabs:agents
    config:
      llmCascade:
        primary: gpt-4o
        fallback:
          - gpt-4o-mini
          - gpt-3.5-turbo
        cascadeOnError: true
        cascadeOnLatency:
          enabled: true
          maxLatencyMs: 5000

Multi-voice Conversations

Different voices for different characters:

yaml
providers:
  - id: elevenlabs:agents
    config:
      multiVoice:
        characters:
          - name: Agent
            voiceId: 21m00Tcm4TlvDq8ikWAM
            role: Customer support representative
          - name: Customer
            voiceId: 2EiwWnXFnvU5JabPnv8n
            role: Customer seeking help

Phone Integration

Test agents with real phone calls:

yaml
providers:
  - id: elevenlabs:agents
    config:
      phoneConfig:
        provider: twilio
        twilioAccountSid: ${TWILIO_ACCOUNT_SID}
        twilioAuthToken: ${TWILIO_AUTH_TOKEN}
        twilioPhoneNumber: +1234567890

Cost Tracking

ElevenLabs usage is tracked automatically:

TTS Costs:

  • Flash v2.5: ~$0.015 per 1,000 characters
  • Turbo v2.5: ~$0.02 per 1,000 characters
  • Multilingual v2: ~$0.03 per 1,000 characters

STT Costs:

  • ~$0.10 per minute of audio

Agent Costs:

  • Based on conversation duration (~$0.10-0.50 per minute depending on LLM)

Supporting API Costs:

  • Audio Isolation: ~$0.10 per minute
  • Forced Alignment: ~$0.05 per minute

View costs in eval results:

yaml
tests:
  - assert:
      - type: cost
        threshold: 0.50 # Max $0.50 per test

Common voice IDs and names:

NameIDDescription
Rachel21m00Tcm4TlvDq8ikWAMCalm, clear female
Clyde2EiwWnXFnvU5JabPnv8nWarm male
Drew29vD33N1CtxCmqQRPOHJWell-rounded male
Paul5Q0t7uMcjvnagumLfvZiCasual male
DomiAZnzlk1XvdvUeBnXmlldEnergetic female
BellaEXAVITQu4vr4xnSDxMaLExpressive female
AntoniErXwobaYiN019PkySvjVDeep male
ElliMF3mGyEYCl7XYWbV9V6OYoung female

Common Workflows

Voice Quality Testing

Compare voice quality across models and voices:

yaml
prompts:
  - 'The quick brown fox jumps over the lazy dog. This sentence contains every letter of the alphabet.'

providers:
  - id: flash-model
    label: Flash Model (Fastest)
    config:
      modelId: eleven_flash_v2_5
      voiceId: rachel

  - id: turbo-model
    label: Turbo Model (Best Quality)
    config:
      modelId: eleven_turbo_v2_5
      voiceId: rachel

tests:
  - description: Flash model completes quickly
    provider: flash-model
    assert:
      - type: latency
        threshold: 1000

  - description: Turbo model has better quality
    provider: turbo-model
    assert:
      - type: cost
        threshold: 0.01

Transcription Accuracy Pipeline

Test end-to-end TTS → STT accuracy:

yaml
prompts:
  - |
    The meeting is scheduled for Thursday at 2 PM in conference room B.
    Please bring your laptop and quarterly report.

providers:
  - id: tts-generator
    label: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5

  - id: stt-transcriber
    label: elevenlabs:stt
    config:
      calculateWER: true

tests:
  - vars:
      referenceText: 'The meeting is scheduled for Thursday at 2 PM in conference room B. Please bring your laptop and quarterly report.'
    assert:
      - type: javascript
        value: |
          const result = JSON.parse(output);
          if (result.wer_result) {
            return result.wer_result.wer < 0.03; // Less than 3% error
          }
          return true;

Agent Regression Testing

Ensure agent improvements don't degrade performance:

yaml
prompts:
  - |
    User: I need to cancel my subscription
    User: Yes, I'm sure
    User: Account email is [email protected]

providers:
  - id: elevenlabs:agents
    config:
      agentConfig:
        prompt: You are a customer service agent. Always confirm cancellations.
        llmModel: gpt-4o
      evaluationCriteria:
        - name: confirmation_requested
          description: Agent asks for confirmation before canceling
          weight: 1.0
          passingThreshold: 0.9
        - name: professional_tone
          description: Agent maintains professional tone
          weight: 0.8
          passingThreshold: 0.8

tests:
  - description: Agent handles cancellation properly
    assert:
      - type: javascript
        value: |
          const result = JSON.parse(output);
          const criteria = result.analysis.evaluation_criteria_results;
          return criteria.every(c => c.passed);

Best Practices

1. Choose the Right Model

  • Flash v2.5: Use for real-time applications, live streaming, or when latency is critical (<200ms)
  • Turbo v2.5: Use for high-quality pre-recorded content where quality matters more than speed
  • Multilingual v2: Use for non-English languages or when switching between languages
  • Monolingual v1: Use for English-only content requiring the highest quality

2. Optimize Voice Settings

For natural conversation:

yaml
voiceSettings:
  stability: 0.5 # More variation
  similarity_boost: 0.75
  speed: 1.0

For consistent narration:

yaml
voiceSettings:
  stability: 0.8 # Less variation
  similarity_boost: 0.85
  speed: 0.95

For expressiveness:

yaml
voiceSettings:
  stability: 0.3 # High variation
  similarity_boost: 0.5
  style: 0.8 # Amplify style
  speed: 1.1

3. Cost Optimization

Use caching for repeated phrases:

yaml
providers:
  - id: elevenlabs:tts:rachel
    config:
      cache: true
      cacheTTL: 86400 # 24 hours

Implement LLM cascading for agents:

yaml
providers:
  - id: elevenlabs:agents
    config:
      llmCascade:
        primary: gpt-4o-mini # Cheaper first
        fallback:
          - gpt-4o # Better fallback
        cascadeOnError: true

Test with shorter prompts during development:

yaml
providers:
  - id: elevenlabs:tts:rachel
tests:
  - vars:
      shortPrompt: 'Test' # Use during dev
      fullPrompt: 'Full production message'

4. Agent Testing Strategy

Start simple, add complexity incrementally:

yaml
# Phase 1: Basic functionality
evaluationCriteria:
  - name: responds
    description: Agent responds to user
    weight: 1.0

# Phase 2: Add quality checks
evaluationCriteria:
  - name: responds
    weight: 0.8
  - name: accurate
    description: Response is factually correct
    weight: 1.0

# Phase 3: Add conversation flow
evaluationCriteria:
  - name: responds
    weight: 0.6
  - name: accurate
    weight: 1.0
  - name: natural_flow
    description: Conversation feels natural
    weight: 0.8

5. Audio Quality Assurance

Always test on target platforms:

yaml
providers:
  - id: elevenlabs:tts:rachel
    config:
      outputFormat: mp3_44100_128 # Good for web
      # outputFormat: pcm_44100      # Better for phone systems
      # outputFormat: mp3_22050_32   # Smaller files for mobile

Test with diverse content:

yaml
prompts:
  # Numbers and dates
  - 'Your appointment is on March 15th at 3:30 PM. Confirmation number: 4829.'

  # Technical terms
  - 'The API returns a JSON response with OAuth2 authentication tokens.'

  # Multi-language
  - 'Bonjour! Welcome to our multilingual support.'

  # Edge cases
  - 'Hello... um... can you hear me? Testing, 1, 2, 3.'

6. Monitoring and Observability

Track key metrics:

yaml
tests:
  - assert:
      # Latency thresholds
      - type: latency
        threshold: 2000

      # Cost budgets
      - type: cost
        threshold: 0.50

      # Quality metrics
      - type: javascript
        value: |
          // Track custom metrics
          const result = JSON.parse(output);
          if (result.audio) {
            console.log('Audio size:', result.audio.sizeBytes);
            console.log('Format:', result.audio.format);
          }
          return true;

Use labels for organized results:

yaml
providers:
  - label: v1-baseline
    id: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5

  - label: v2-improved
    id: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5
      voiceSettings:
        stability: 0.6 # Tweaked setting

Troubleshooting

API Key Issues

Error: ELEVENLABS_API_KEY environment variable is not set

Solution: Ensure your API key is properly set:

sh
# Check if key is set
echo $ELEVENLABS_API_KEY

# Set it if missing
export ELEVENLABS_API_KEY=your_key_here

# Or add to your shell profile
echo 'export ELEVENLABS_API_KEY=your_key' >> ~/.zshrc
source ~/.zshrc

Authentication Errors

Error: 401 Unauthorized

Solution: Verify your API key is valid:

sh
# Test API key directly
curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices

If this fails, regenerate your API key at ElevenLabs Settings.

Rate Limiting

Error: 429 Too Many Requests

Solution: Add retry logic and respect rate limits:

yaml
providers:
  - id: elevenlabs:tts:rachel
    config:
      retries: 3 # Retry failed requests
      timeout: 30000 # Allow time for retries

For high-volume testing, consider:

  • Spreading tests over time
  • Upgrading to a paid plan
  • Using caching to avoid redundant requests

Audio File Issues

Error: Failed to read audio file or Unsupported audio format

Solution: Ensure audio files are accessible and in supported formats:

yaml
providers:
  - id: elevenlabs:stt
    config:
      audioFormat: mp3 # Supported: mp3, wav, flac, ogg, webm, m4a

Verify file exists:

sh
ls -lh /path/to/audio.mp3
file /path/to/audio.mp3

Agent Conversation Timeouts

Error: Conversation timeout after X turns

Solution: Adjust conversation limits:

yaml
providers:
  - id: elevenlabs:agents
    config:
      maxTurns: 20 # Increase if needed
      timeout: 120000 # 2 minutes

Memory Issues with Large Evals

Error: JavaScript heap out of memory

Solution: Increase Node.js memory:

sh
export NODE_OPTIONS="--max-old-space-size=4096"
promptfoo eval

Or run fewer concurrent tests:

sh
promptfoo eval --max-concurrency 2

Voice Not Found

Error: Voice ID not found

Solution: Use correct voice ID or name:

yaml
providers:
  # Use official voice ID (preferred)
  - id: elevenlabs:tts:21m00Tcm4TlvDq8ikWAM

  # Or use voice name (case-sensitive)
  - id: elevenlabs:tts:Rachel

List available voices:

sh
curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices

Cost Tracking Inaccuracies

Issue: Cost estimates don't match billing

Solution: Cost tracking is estimated based on:

  • TTS: Character count × model rate
  • STT: Audio duration × per-minute rate
  • Agents: Conversation duration × LLM rates

For exact costs, check your ElevenLabs billing dashboard.

Examples

Complete working examples:

Learn More

Promptfoo Resources

ElevenLabs Resources