notebooks/Tutorial_1_use-pretrained-TTS.ipynb
💡: Grab a pre-trained model and use it to synthesize speech using any speaker voice, including yours! ⚡
🐸 TTS comes with a list of pretrained models and speaker voices. You can even start a local demo server that you can open it on your favorite web browser and 🗣️ .
In this notebook, we will:
1. List available pre-trained 🐸 TTS models
2. Run a 🐸 TTS model
3. Listen to the synthesized wave 📣
4. Run multispeaker 🐸 TTS model
So, let's jump right in!
! pip install -U pip
! pip install TTS
Coqui 🐸TTS comes with a list of pretrained models for different model types (ex: TTS, vocoder), languages, datasets used for training and architectures.
You can either use your own model or the release models under 🐸TTS.
Use tts --list_models to find out the availble models.
! tts --list_models
You can simply copy the full model name from the list above and use it
!tts --text "hello world" \
--model_name "tts_models/en/ljspeech/glow-tts" \
--out_path output.wav
import IPython
IPython.display.Audio("output.wav")
🔶 A TTS model can be either trained on a single speaker voice or multispeaker voices. This training choice is directly reflected on the inference ability and the available speaker voices that can be used to synthesize speech.
🔶 If you want to run a multispeaker model from the released models list, you can first check the speaker ids using --list_speaker_idx flag and use this speaker voice to synthesize speech.
# list the possible speaker IDs.
!tts --model_name "tts_models/en/vctk/vits" \
--list_speaker_idxs
!tts --text "Trying out specific speaker voice"\
--out_path spkr-out.wav --model_name "tts_models/en/vctk/vits" \
--speaker_idx "p341"
import IPython
IPython.display.Audio("spkr-out.wav")
🔶 If you want to use an external speaker to synthesize speech, you need to supply --speaker_wav flag along with an external speaker encoder path and config file, as follows:
First we need to get the speaker encoder model, its config and a referece speaker_wav
!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/config_se.json
!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/model_se.pth.tar
!wget https://github.com/coqui-ai/TTS/raw/speaker_encoder_model/tests/data/ljspeech/wavs/LJ001-0001.wav
!tts --model_name tts_models/multilingual/multi-dataset/your_tts \
--encoder_path model_se.pth.tar \
--encoder_config config_se.json \
--speaker_wav LJ001-0001.wav \
--text "Are we not allowed to dim the lights so people can see that a bit better?"\
--out_path spkr-out.wav \
--language_idx "en"
import IPython
IPython.display.Audio("spkr-out.wav")
Follow up with the next tutorials to learn more adnavced material.