docs/features/speech-to-text.md
Subtitle Edit can automatically transcribe audio to text using Whisper-based speech recognition engines.
| Engine | Platform | GPU Support |
|---|---|---|
| Whisper.cpp | Windows, Linux, macOS | CPU only |
| Whisper.cpp (cuBLAS) | Windows | NVIDIA CUDA |
| Whisper.cpp (Vulkan) | Windows | Vulkan GPU |
| Purfview's Faster Whisper XXL | Windows, Linux | NVIDIA CUDA |
| Whisper CTranslate2 | Windows, Linux, macOs | CPU only / NVIDIA CUDA (requires CUDA 12.x) |
| Const-me's Whisper | Windows | DirectX |
| OpenAI Whisper | All (Python required) | NVIDIA CUDA |
| Chat LLM cpp | Windows, Linux | CPU/GPU |
Engines and models are downloaded automatically on first use.
Each engine has its own set of models. Common model sizes:
Models ending in .en are English-only and perform better for English audio.
Transcribe multiple video files at once:
.srt files next to the video filesClick the Advanced button to configure custom command-line arguments for the Whisper engine:
Click the Post-processing button to configure:
The console log at the bottom shows real-time output from the Whisper process, useful for debugging issues.
--standard parameter is automatically added for Purfview's Faster Whisper XXL