provider-elevenlabs/alignment (ElevenLabs Forced Alignment)

Generate time-aligned subtitles (SRT/VTT) from audio and transcripts using ElevenLabs forced alignment.

Quick Start

bash

npx promptfoo@latest init --example provider-elevenlabs/alignment
cd provider-elevenlabs/alignment
export ELEVENLABS_API_KEY=your_api_key_here
npx promptfoo@latest eval

What this tests

Subtitle generation: Create SRT and VTT subtitle files
Word-level alignment: Precise timestamp data for each word
Multiple formats: JSON (raw data), SRT (video players), VTT (web players)
Accuracy: Verify alignment matches audio timing

How it works

Forced alignment takes two inputs:

Audio file: Speech recording (MP3, WAV, etc.)
Transcript: Text of what was spoken

It returns precise timestamps showing when each word was spoken, formatted as subtitles.

Use Cases

Video subtitles: Generate SRT files for video editing software
Web captions: Create VTT files for HTML5 video players
Karaoke apps: Word-level timing for synchronized highlighting
Accessibility: Auto-generate captions for spoken content
Translation sync: Time-align translations to original audio

Output Formats

JSON (Raw alignment data)

json

{
  "alignment": [
    { "char": "T", "start": 0.0, "end": 0.1 },
    { "char": "h", "start": 0.1, "end": 0.15 }
  ],
  "characters": "That's one small step..."
}

SRT (Standard video subtitles)

text

1
00:00:00,000 --> 00:00:02,500
That's one small step for man

2
00:00:02,500 --> 00:00:05,000
one giant leap for mankind

VTT (WebVTT for web players)

text

WEBVTT

1
00:00:00.000 --> 00:00:02.500
That's one small step for man

2
00:00:02.500 --> 00:00:05.000
one giant leap for mankind

Configuration

Basic alignment (JSON output)

yaml

providers:
  - id: elevenlabs:alignment:json
    label: Alignment (JSON)

tests:
  - vars:
      audioFile: path/to/audio.mp3
      transcript: 'Your transcript text here'
      format: json

SRT subtitles

yaml

providers:
  - id: elevenlabs:alignment:srt
    label: Alignment (SRT Subtitles)

tests:
  - vars:
      audioFile: path/to/audio.mp3
      transcript: 'Your transcript text here'
      format: srt

VTT subtitles

yaml

providers:
  - id: elevenlabs:alignment:vtt
    label: Alignment (VTT Subtitles)

tests:
  - vars:
      audioFile: path/to/audio.mp3
      transcript: 'Your transcript text here'
      format: vtt

Testing Assertions

yaml

tests:
  # Verify alignment succeeds
  - assert:
      - type: javascript
        value: output.includes('words') # JSON format
      - type: not-contains
        value: error

  # Verify SRT format
  - assert:
      - type: javascript
        value: output.includes('-->') && output.includes('small step')

Best Practices

Transcript accuracy: Ensure transcript exactly matches spoken audio
Include punctuation: Better subtitle chunking and timing
Audio quality: Clear audio produces more accurate timestamps
Format selection:
- Use SRT for video editing (Premiere, Final Cut, DaVinci)
- Use VTT for web players (HTML5 <video> tag)
- Use JSON for custom processing

Cost Information

Forced alignment pricing is based on audio duration:

~$0.05 per minute of audio

The provider automatically tracks costs in evaluation results.

ElevenLabs STT - Speech-to-text transcription
ElevenLabs Isolation - Audio noise removal

provider-elevenlabs/alignment (ElevenLabs Forced Alignment)

provider-elevenlabs/alignment (ElevenLabs Forced Alignment)

Quick Start

What this tests

How it works

Use Cases

Output Formats

JSON (Raw alignment data)

SRT (Standard video subtitles)

VTT (WebVTT for web players)

Configuration

Basic alignment (JSON output)

SRT subtitles

VTT subtitles

Testing Assertions

Best Practices

Cost Information

Related Examples

Resources