docs/features/speech-to-text.md
Subtitle Edit can automatically transcribe audio to text using Whisper-based and other modern speech recognition engines.
| Engine | Platform | Notes |
|---|---|---|
| Whisper CPP | Windows, Linux, macOS | Local CPU engine. On Windows the cuBLAS (NVIDIA CUDA) and Vulkan GPU backends can also be selected from the Whisper CPP backend dropdown. |
| Purfview Faster Whisper XXL | Windows, Linux | Fast local engine, often used with NVIDIA CUDA |
| Whisper CTranslate2 | Windows, Linux (x64), macOS (Apple Silicon) | CPU / NVIDIA CUDA depending on installation; CUDA requires CUDA 12.x |
| Whisper Const-me | Windows | DirectX-based engine |
| Whisper OpenAI | All | Python-based OpenAI Whisper workflow |
| OpenAI Compatible Server | All | Connect to any OpenAI-compatible speech-to-text endpoint |
| Qwen3 ASR CPP | Windows, Linux | Local Qwen3 ASR engine with downloadable GGUF models |
| Crisp ASR | Windows, Linux, macOS | Single engine with selectable backends: Parakeet, Canary, Cohere, Fire Red, GLM, Granite, Qwen3, Mega, Omni, Kyutai |
Engines and models are downloaded automatically on first use.
Each engine has its own set of models. Common model sizes:
Models ending in .en are English-only and perform better for English audio.
Transcribe multiple video files at once:
.srt files next to the video filesClick the Advanced button to configure custom command-line arguments for the Whisper engine:
Advanced settings are stored per engine, so you can keep separate parameters for Whisper CPP, Qwen3 ASR, Crisp ASR, and other engines.
Click the Post-processing button to configure:
The console log at the bottom shows real-time output from the Whisper process, useful for debugging issues.
--standard parameter is automatically added for Purfview Faster Whisper XXL