Back to Wav2letter

Steps to reproduce results on Librispeech

recipes/seq2seq_tds/librispeech/README.md

0.23.4 KB
Original Source

Steps to reproduce results on Librispeech

Dependencies

Check out the following commits:

Instructions

Run data and auxiliary files (like lexicon, tokens set, etc.) preparation (set necessary paths instead of [...]: data_dst path to data to store, model_dst path to auxiliary path to store).

pip install sentencepiece==0.1.82
python3 prepare.py --data_dst [...] --model_dst [...]

Besides data the auxiliary files for acoustic and language models training/evaluation will be generated:

cd $MODEL_DST
tree -L 2
.
├── am
│   ├── librispeech-train+dev-unigram-10000-nbest10.lexicon
│   └── librispeech-train-all-unigram-10000.tokens
└── decoder

To run training/decoding:

  • Fix the paths inside *.cfg
  • Run training with train.cfg The parameters and settings in train.cfg are for running experiments on a single node with 8 GPUs (--enable_distributed=true). Distributed jobs can be launched using Open MPI.
  • Run decoding with decode*.cfg

Pre-trained acoustic and language models

Below there is info about pre-trained acoustic and language models, which one can use, for example, to reproduce a decoding step.

Acoustic Models

FileDatasetDev SetArchitectureLexiconTokens
baseline_dev-cleanLibriSpeechdev-cleanArchfileLexiconTokens
baseline_dev-otherLibriSpeechdev-otherArchfileLexiconTokens

Here architecture files are the same as network.arch, tokens and lexicon files generated in the $MODEL_DST/am/ are the same as in the table.

Language Models

Convolutional language models (ConvLM) are trained with the fairseq toolkit. n-gram language models are trained with the KenLM toolkit. The below language models are converted into a binary format compatible with the wav2letter++ decoder.

NameDatasetTypeVocab
lm_librispeech_convlm_14BLibriSpeechConvLM 14BLM Vocab
lm_librispeech_kenlm_4gLibriSpeech4-gram-

To reproduce decoding step from the paper download these models into $MODEL_DST/decoder/.