Speech to Text

Subtitle Edit can automatically transcribe audio to text using Whisper-based speech recognition engines.

Supported Engines

Engine	Platform	GPU Support
Whisper.cpp	Windows, Linux, macOS	CPU only
Whisper.cpp (cuBLAS)	Windows	NVIDIA CUDA
Whisper.cpp (Vulkan)	Windows	Vulkan GPU
Purfview's Faster Whisper XXL	Windows, Linux	NVIDIA CUDA
Whisper CTranslate2	Windows, Linux, macOs	CPU only / NVIDIA CUDA (requires CUDA 12.x)
Const-me's Whisper	Windows	DirectX
OpenAI Whisper	All (Python required)	NVIDIA CUDA
Chat LLM cpp	Windows, Linux	CPU/GPU

Engines and models are downloaded automatically on first use.

Open a video file in Subtitle Edit
Go to Video → Speech to text (Whisper)...
Select an Engine from the dropdown
Select a Model (larger models = better accuracy but slower)
Select the Language of the audio
Optionally enable:
- Translate to English — Translate non-English audio to English
- Adjust timings — Post-process timing using waveform data
- Post-processing — Fix casing, merge lines, add periods, etc.
Click Transcribe

Each engine has its own set of models. Common model sizes:

Models ending in .en are English-only and perform better for English audio.

Transcribe multiple video files at once:

Click the Advanced button to configure custom command-line arguments for the Whisper engine:

Click the Post-processing button to configure:

The console log at the bottom shows real-time output from the Whisper process, useful for debugging issues.

For NVIDIA GPU users, use Whisper.cpp (cuBLAS) or Purfview's Faster Whisper XXL for fastest transcription
If you get "CUDA out of memory" errors, try a smaller model
The --standard parameter is automatically added for Purfview's Faster Whisper XXL
You can re-download an engine by right-clicking the engine area