Back to Unilm

Unit to Speech Model (unit2speech)

edgelm/examples/textless_nlp/gslm/unit2speech/README.md

latest3.3 KB
Original Source

Unit to Speech Model (unit2speech)

Unit to speech model is modified Tacotron2 model that learns to synthesize speech from discrete speech units. All models are trained on quantized LJSpeech.

Upstream UnitsDownload Link
Log Mel Filterbank + KM50download
Log Mel Filterbank + KM100download
Log Mel Filterbank + KM200download
Log Mel Filterbank + KM500download
Modified CPC + KM50download
Modified CPC + KM100download
Modified CPC + KM200download
Modified CPC + KM500download
HuBERT Base + KM50download
HuBERT Base + KM100download
HuBERT Base + KM200download
HuBERT Base + KM500download
wav2vec 2.0 Large + KM50download
wav2vec 2.0 Large + KM100download
wav2vec 2.0 Large + KM200download
wav2vec 2.0 Large + KM500download

Run inference using a unit2speech model

  • Install librosa, unidecode and inflect using pip install librosa, unidecode, inflect
  • Download Waveglow checkpoint. This is the vocoder.

Sample commnd to run inference using trained unit2speech models. Please note that the quantized audio to synthesized should be using the same units as the unit2speech model was trained with.

FAIRSEQ_ROOT=<path_to_your_fairseq_repo_root>
TTS_MODEL_PATH=<unit2speech_model_file_path>
QUANTIZED_UNIT_PATH=<quantized_audio_file_path>
OUT_DIR=<dir_to_dump_synthesized_audio_files>
WAVEGLOW_PATH=<path_where_you_have_downloaded_waveglow_checkpoint>

PYTHONPATH=${FAIRSEQ_ROOT}:${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech python ${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech/synthesize_audio_from_units.py \
    --tts_model_path $TTS_MODEL_PATH \
    --quantized_unit_path $QUANTIZED_UNIT_PATH \
    --out_audio_dir $OUT_DIR \
    --waveglow_path  $WAVEGLOW_PATH \
    --max_decoder_steps 2000