Sponsors: Recall.ai - Meeting Transcription API

If you’re looking for a transcription API for meetings, consider checking out Recall.ai , an API that works with Zoom, Google Meet, Microsoft Teams, and more

pyVideoTrans

A Powerful Open Source Video Translation / Audio Transcription / AI Dubbing / Subtitle Translation Tool

中文 | Documentation | Online Q&A

</div>

pyVideoTrans is dedicated to seamlessly converting videos from one language to another, offering a complete workflow that includes speech recognition, subtitle translation, multi-role dubbing, and audio-video synchronization. It supports both local offline deployment and a wide variety of mainstream online APIs.

✨ Core Features

Technical Architecture and Principles

🎥 Fully Automatic Video Translation: One-click workflow: Speech Recognition (ASR) → Subtitle Translation → Speech Synthesis (TTS) → Video Synthesis.
🎙️ Audio Transcription / Subtitle Generation: Batch convert audio/video to SRT subtitles, supporting Speaker Diarization to distinguish between different roles.
🗣️ Multi-Role AI Dubbing: Assign different AI dubbing voices to different speakers.
🧬 Voice Cloning: Integrates models like F5-TTS, CosyVoice, GPT-SoVITS for zero-shot voice cloning.
🧠 Powerful Model Support:
- ASR: Faster-Whisper (Local), OpenAI Whisper, Alibaba Qwen, ByteDance Volcano, Azure, Google, etc.
- LLM Translation: DeepSeek, ChatGPT, Claude, Gemini, MiniMax, Ollama (Local), Alibaba Bailian, etc.
- TTS: Edge-TTS (Free), OpenAI, Azure, Minimaxi, ChatTTS, ChatterBox, etc.
🖥️ Interactive Editing: Supports pausing and manual proofreading at each stage (recognition, translation, dubbing) to ensure accuracy.
🛠️ Utility Toolkit: Includes auxiliary tools such as vocal separation, video/subtitle merging, audio-video alignment, and transcript matching.
💻 Command Line Interface (CLI): Supports headless operation, convenient for server deployment or batch processing.
🌐 Web Interface (WebUI): Browser-based interface for remote access or internal network deployment.

🚀 Quick Start (Windows Users)

We provide a pre-packaged .exe version for Windows 10/11 users, requiring no Python environment configuration.

Download: Click to download the latest pre-packaged version
Unzip: Extract the compressed file to a path without Chinese characters or spaces (e.g., D:\pyVideoTrans).
Run: Double-click sp.exe inside the folder to launch.

Note:

Do not run directly from within the compressed archive.

To use GPU acceleration, ensure CUDA 12.8 and cuDNN 9.11 are installed.

🛠️ Source Deployment (macOS / Linux / Windows Developers)

We recommend using uv for package management for faster speed and better environment isolation.

1. Prerequisites

Python: Recommended version 3.10
FFmpeg: Must be installed and configured in the environment variables.
- macOS: brew install ffmpeg libsndfile git
- Linux (Ubuntu/Debian): sudo apt-get install ffmpeg libsndfile1-dev
- Windows: Download FFmpeg and configure Path, or place ffmpeg.exe and ffprobe.exe directly in the project directory.

2. Install uv (If not installed)

bash

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

3. Clone and Install

bash

git clone https://github.com/jianchang512/pyvideotrans.git
cd pyvideotrans
uv sync

By default, qwen-tts, qwen-asr, moss-tts, and chatterbox are not installed locally.

To install all optional channels: uv sync --all-extra

To install individually: uv sync --extra qwentts / uv sync --extra qwenasr / uv sync --extra mosstts / uv sync --extra chatterbox

4. Launch Software

GUI:

bash

uv run sp.py

CLI:

bash

# Video Translation
uv run cli.py --task vtv --name "./video.mp4" --source_language_code zh-cn --target_language_code en --voice_role "en-US-GuyNeural"

# Audio to Subtitle
uv run cli.py --task stt --name "./audio.wav" --model_name large-v3

# Subtitle Translation
uv run cli.py --task sts --name "./subs.srt" --target_language_code en

# Text to Speech
uv run cli.py --task tts --name "./subs.srt" --voice_role "zh-CN-YunyangNeural"

CLI documentation with all parameters

WebUI (for remote/internal network access):

bash

uv sync --extra webui
uv run webui.py

Docker (containerized deployment):

bash

# Build
docker build -t pyvideotrans-webui .

# Run
docker run -d -p 7860:7860 --name pyvideotrans pyvideotrans-webui

# With persistent config and output
docker run -d -p 7860:7860 \
  -v ./data/output:/app/output \
  -v ./data/config:/app/videotrans \
  --name pyvideotrans pyvideotrans-webui

WebUI documentation

5. (Optional) GPU Acceleration Configuration

If you have an NVIDIA graphics card, execute the following commands to install the CUDA-supported PyTorch version:

bash

# Uninstall CPU version
uv remove torch torchaudio

# Install CUDA version (Example for CUDA 12.x)
uv add torch==2.7 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
uv add nvidia-cublas-cu12 nvidia-cudnn-cu12

AMD GPU acceleration via Whisper.NET

🧩 Supported Channels & Models (Partial)

Category	Channel/Model	Description
ASR (Speech Recognition)	Faster-Whisper (Local)	Recommended, fast speed, high accuracy
	WhisperX / Parakeet	Supports timestamp alignment & speaker diarization
	Alibaba Qwen3-ASR / ByteDance Volcano	Online API, excellent for Chinese
Translation (LLM/MT)	DeepSeek / ChatGPT	Supports context understanding, more natural translation
	MiniMax AI	MiniMax M3 LLM, latest flagship model, OpenAI-compatible
	Google / Microsoft	Traditional machine translation, fast speed
	Ollama / M2M100	Fully local offline translation
TTS (Speech Synthesis)	Edge-TTS	Microsoft free interface, natural effect
	F5-TTS / CosyVoice	Supports Voice Cloning, requires local deployment
	GPT-SoVITS / ChatTTS	High-quality open-source TTS
	302.AI / OpenAI / Azure	High-quality commercial API

📚 Documentation & Support

Official Documentation: https://pyvideotrans.com (Includes detailed tutorials, API configuration guides, FAQ)
Online Q&A Community: https://bbs.pyvideotrans.com (Submit error logs for automated AI analysis and answers)
GitHub Wiki: architecture.md | cli.md | webui.md | Synchronize.md | faq.md

⚠️ Disclaimer

This software is an open-source, free, non-commercial project. Users are solely responsible for any legal consequences arising from the use of this software (including but not limited to calling third-party APIs or processing copyrighted video content). Please comply with local laws and regulations and the terms of use of relevant service providers.

🙏 Acknowledgements

This project mainly relies on the following open-source projects (partial):

Created by jianchang512