packages/docs/docs/editor-starter/captioning.mdx
The Editor Starter comes with a method to generate captions for videos and audio assets.
It uses the OpenAI Whisper API by default.
For implementation details, refer to the source code in src/editor/captioning.
In the Editor Starter, captions are treated as a first-class item type, similar to videos, images, or audio. This allows them to be manipulated like any other layer in the timeline and canvas.
To generate captions using OpenAI's Whisper model, add your OpenAI key to the .env file:
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxx
This enables server-side transcription if the /api/captions backend route is present.
Click "Generate Captions" on a video or audio layer:
/api/captions and transcribes it via OpenAI (note: A limit of 25MB applies)Caption type and adds it to the timeline as a CaptionsItem.The inspector allows users to edit the following properties of captions by default:
Captions are automatically split into "pages" for easier management. Pages are timed groups of words or sentences that fit nicely on screen. This is achieved by using createTikTokStyleCaptions from @remotion/captions package.
The default way of captioning is to use the OpenAI Whisper API, which has a limit of 25MB per request.
At a 16Khz sample rate, this is about 13.4 minutes of mono audio.
By default, the Editor Starter disables the captioning feature if the audio is longer than that.
Review the logic of MAX_DURATION_ALLOWING_CAPTIONING_IN_SEC to tweak it.
@remotion/whisper-webYou can replace the OpenAI Whisper API with @remotion/whisper-web for local, in-browser transcription.
This eliminates the need for an OpenAI key and S3 fetches for transcription, but you'll still need to handle audio loading locally.
Caveats:
@remotion/install-whisper-cppYou can use @remotion/install-whisper-cpp to transcribe audio on a Node.js server.
Caveats:
Any way of transcription can be used.
We recommend that you convert the captions to the Caption shape so that rendering and editing the captions does not need to be refactored.