edgelm/examples/textless_nlp/gslm/unit2speech/README.md
Unit to speech model is modified Tacotron2 model that learns to synthesize speech from discrete speech units. All models are trained on quantized LJSpeech.
| Upstream Units | Download Link |
|---|---|
| Log Mel Filterbank + KM50 | download |
| Log Mel Filterbank + KM100 | download |
| Log Mel Filterbank + KM200 | download |
| Log Mel Filterbank + KM500 | download |
| Modified CPC + KM50 | download |
| Modified CPC + KM100 | download |
| Modified CPC + KM200 | download |
| Modified CPC + KM500 | download |
| HuBERT Base + KM50 | download |
| HuBERT Base + KM100 | download |
| HuBERT Base + KM200 | download |
| HuBERT Base + KM500 | download |
| wav2vec 2.0 Large + KM50 | download |
| wav2vec 2.0 Large + KM100 | download |
| wav2vec 2.0 Large + KM200 | download |
| wav2vec 2.0 Large + KM500 | download |
pip install librosa, unidecode, inflectSample commnd to run inference using trained unit2speech models. Please note that the quantized audio to synthesized should be using the same units as the unit2speech model was trained with.
FAIRSEQ_ROOT=<path_to_your_fairseq_repo_root>
TTS_MODEL_PATH=<unit2speech_model_file_path>
QUANTIZED_UNIT_PATH=<quantized_audio_file_path>
OUT_DIR=<dir_to_dump_synthesized_audio_files>
WAVEGLOW_PATH=<path_where_you_have_downloaded_waveglow_checkpoint>
PYTHONPATH=${FAIRSEQ_ROOT}:${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech python ${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech/synthesize_audio_from_units.py \
--tts_model_path $TTS_MODEL_PATH \
--quantized_unit_path $QUANTIZED_UNIT_PATH \
--out_audio_dir $OUT_DIR \
--waveglow_path $WAVEGLOW_PATH \
--max_decoder_steps 2000