Easy Inferencing with 🐸 TTS ⚡

You want to quicly synthesize speech using Coqui 🐸 TTS model?

💡: Grab a pre-trained model and use it to synthesize speech using any speaker voice, including yours! ⚡

🐸 TTS comes with a list of pretrained models and speaker voices. You can even start a local demo server that you can open it on your favorite web browser and 🗣️ .

In this notebook, we will:

1. List available pre-trained 🐸 TTS models
2. Run a 🐸 TTS model
3. Listen to the synthesized wave 📣
4. Run multispeaker 🐸 TTS model

So, let's jump right in!

Install 🐸 TTS ⬇️

python

! pip install -U pip
! pip install TTS

✅ List available pre-trained 🐸 TTS models

Coqui 🐸TTS comes with a list of pretrained models for different model types (ex: TTS, vocoder), languages, datasets used for training and architectures.

You can either use your own model or the release models under 🐸TTS.

Use tts --list_models to find out the availble models.

python

! tts --list_models

✅ Run a 🐸 TTS model

First things first: Using a release model and default vocoder:

You can simply copy the full model name from the list above and use it

python

!tts --text "hello world" \
--model_name "tts_models/en/ljspeech/glow-tts" \
--out_path output.wav

📣 Listen to the synthesized wave 📣

python

import IPython
IPython.display.Audio("output.wav")

Second things second:

🔶 A TTS model can be either trained on a single speaker voice or multispeaker voices. This training choice is directly reflected on the inference ability and the available speaker voices that can be used to synthesize speech.

🔶 If you want to run a multispeaker model from the released models list, you can first check the speaker ids using --list_speaker_idx flag and use this speaker voice to synthesize speech.

python

# list the possible speaker IDs.
!tts --model_name "tts_models/en/vctk/vits" \
--list_speaker_idxs

💬 Synthesize speech using speaker ID 💬

python

!tts --text "Trying out specific speaker voice"\
--out_path spkr-out.wav --model_name "tts_models/en/vctk/vits" \
--speaker_idx "p341"

📣 Listen to the synthesized speaker specific wave 📣

python

import IPython
IPython.display.Audio("spkr-out.wav")

🔶 If you want to use an external speaker to synthesize speech, you need to supply --speaker_wav flag along with an external speaker encoder path and config file, as follows:

First we need to get the speaker encoder model, its config and a referece speaker_wav

python

!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/config_se.json
!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/model_se.pth.tar
!wget https://github.com/coqui-ai/TTS/raw/speaker_encoder_model/tests/data/ljspeech/wavs/LJ001-0001.wav

python

!tts --model_name tts_models/multilingual/multi-dataset/your_tts \
--encoder_path model_se.pth.tar \
--encoder_config config_se.json \
--speaker_wav LJ001-0001.wav \
--text "Are we not allowed to dim the lights so people can see that a bit better?"\
--out_path spkr-out.wav \
--language_idx "en"

📣 Listen to the synthesized speaker specific wave 📣

python

import IPython
IPython.display.Audio("spkr-out.wav")

🎉 Congratulations! 🎉 You now know how to use a TTS model to synthesize speech!

Follow up with the next tutorials to learn more adnavced material.