packages/docs/docs/recorder/captions.mdx
To generate captions, the Remotion Recorder uses Whisper.cpp for fast and accurate transcriptions. Each time you record a clip with the Remotion Recorder, captions are automatically generated and persisted to same folder as your recordings.
The very first time you finish recording a clip, Whisper.cpp and a 1.5GB model will be installed on your computer. This may take a few minutes.
Once installed, captions for the webcam clip will be generated.
:::note
Captions are only generated for files with the webcam prefix.
:::
If the AI has made a mistake, no problem, there are various ways to correct the transcriptions manually. See here how to do this.
For external recordings, you can also generate captions via the CLI.
bun sub.ts
Note that the names of the files you want to transcribe need to start with the prefix webcam, all other files will be ingored.
The JSON files containing the captions will be generated and saved under public/<composition-id>/sub[timestamp].json.
If you do not record in English, edit the config/whisper.ts file.
Set the language to a supported value change change the model to a supported value that does not end in .en.
It is advised to choose a larger model if you are transcribing in a non-english language.