Back to Unilm

VCTK

edgelm/examples/speech_synthesis/docs/vctk_example.md

latest1.7 KB
Original Source

[Back]

VCTK

VCTK is an open English speech corpus. We provide examples for building Transformer models on this dataset.

Data preparation

Download data, create splits and generate audio manifests with

bash
python -m examples.speech_synthesis.preprocessing.get_vctk_audio_manifest \
  --output-data-root ${AUDIO_DATA_ROOT} \
  --output-manifest-root ${AUDIO_MANIFEST_ROOT}

Then, extract log-Mel spectrograms, generate feature manifest and create data configuration YAML with

bash
python -m examples.speech_synthesis.preprocessing.get_feature_manifest \
  --audio-manifest-root ${AUDIO_MANIFEST_ROOT} \
  --output-root ${FEATURE_MANIFEST_ROOT} \
  --ipa-vocab --use-g2p

where we use phoneme inputs (--ipa-vocab --use-g2p) as example.

To denoise audio and trim leading/trailing silence using signal processing based VAD, run

bash
for SPLIT in dev test train; do
    python -m examples.speech_synthesis.preprocessing.denoise_and_vad_audio \
      --audio-manifest ${AUDIO_MANIFEST_ROOT}/${SPLIT}.audio.tsv \
      --output-dir ${PROCESSED_DATA_ROOT} \
      --denoise --vad --vad-agg-level 3
done

Training

(Please refer to the LJSpeech example.)

Inference

(Please refer to the LJSpeech example.)

Automatic Evaluation

(Please refer to the LJSpeech example.)

Results

--archParamsTest MCDModel
tts_transformer54M3.4Download

[Back]