optional-skills/mlops/whisper/references/languages.md
Complete guide to Whisper's multilingual capabilities.
Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Cantonese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Moldavian, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Pushto, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba
import whisper
model = whisper.load_model("turbo")
# Auto-detect language
result = model.transcribe("audio.mp3")
print(f"Detected language: {result['language']}")
print(f"Text: {result['text']}")
# Specify language for faster transcription
result = model.transcribe("audio.mp3", language="es") # Spanish
result = model.transcribe("audio.mp3", language="fr") # French
result = model.transcribe("audio.mp3", language="ja") # Japanese
# Translate any language to English
result = model.transcribe(
"spanish_audio.mp3",
task="translate" # Translates to English
)
print(f"Original language: {result['language']}")
print(f"English translation: {result['text']}")
# Chinese works well with larger models
model = whisper.load_model("large")
result = model.transcribe(
"chinese_audio.mp3",
language="zh",
initial_prompt="这是一段关于技术的讨论" # Context helps
)
# Japanese benefits from initial prompt
result = model.transcribe(
"japanese_audio.mp3",
language="ja",
initial_prompt="これは技術的な会議の録音です"
)
# Arabic: Use large model for best results
model = whisper.load_model("large")
result = model.transcribe(
"arabic_audio.mp3",
language="ar"
)
| Language Tier | Recommended Model | WER |
|---|---|---|
| Top-tier (en, es, fr, de) | base/turbo | < 10% |
| Good (ar, tr, vi) | medium/large | 10-20% |
| Lower-resource | large | 20-30% |
# Detect language only (no transcription)
import whisper
model = whisper.load_model("base")
# Load audio
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
# Make log-Mel spectrogram
mel = whisper.log_mel_spectrogram(audio).to(model.device)
# Detect language
_, probs = model.detect_language(mel)
detected_language = max(probs, key=probs.get)
print(f"Detected language: {detected_language}")
print(f"Confidence: {probs[detected_language]:.2%}")