packages/docs/docs/elevenlabs/elevenlabs-transcript-to-captions.mdx
Turns the output from the ElevenLabs Speech to Text API into an array of Caption objects.
This function can be used in any JavaScript environment, but you should not use the ElevenLabs API in the browser because your API key will be exposed.
When calling the ElevenLabs Speech to Text API, you must set timestamps_granularity to "word" to include word-level timing in the response.
import fs from 'fs';
import {elevenLabsTranscriptToCaptions} from '@remotion/elevenlabs';
const form = new FormData();
form.append('file', new Blob([fs.readFileSync('audio.mp3')]));
form.append('model_id', 'scribe_v2');
form.append('timestamps_granularity', 'word');
const response = await fetch('https://api.elevenlabs.io/v1/speech-to-text', {
method: 'POST',
headers: {
'xi-api-key': process.env.ELEVENLABS_API_KEY!,
},
body: form,
});
const transcript = await response.json();
const {captions} = elevenLabsTranscriptToCaptions({transcript});
An object with the following property:
transcriptThe response from the ElevenLabs Speech to Text API.
Must include a words array with word-level timing — ensure the API is called with timestamps_granularity set to "word".
The words array should contain objects with the following fields:
text: The word texttype: "word", "spacing", or "audio_event" — only "word" entries are usedstart: Start time in secondsend: End time in secondsAn object with the following property:
captionsAn array of Caption objects.