Steps to reproduce results on Librispeech

Dependencies

Check out the following commits:

Instructions

Run data and auxiliary files (like lexicon, tokens set, etc.) preparation (set necessary paths instead of [...]: data_dst path to data to store, model_dst path to auxiliary path to store).

pip install sentencepiece==0.1.82
python3 prepare.py --data_dst [...] --model_dst [...]

Besides data the auxiliary files for acoustic and language models training/evaluation will be generated:

cd $MODEL_DST
tree -L 2
.
├── am
│   ├── librispeech-train+dev-unigram-10000-nbest10.lexicon
│   └── librispeech-train-all-unigram-10000.tokens
└── decoder

To run training/decoding:

Fix the paths inside *.cfg
Run training with train.cfg The parameters and settings in train.cfg are for running experiments on a single node with 8 GPUs (--enable_distributed=true). Distributed jobs can be launched using Open MPI.
Run decoding with decode*.cfg

Pre-trained acoustic and language models

Below there is info about pre-trained acoustic and language models, which one can use, for example, to reproduce a decoding step.

Acoustic Models

File	Dataset	Dev Set	Architecture	Lexicon	Tokens
baseline_dev-clean	LibriSpeech	dev-clean	Archfile	Lexicon	Tokens
baseline_dev-other	LibriSpeech	dev-other	Archfile	Lexicon	Tokens

Here architecture files are the same as network.arch, tokens and lexicon files generated in the $MODEL_DST/am/ are the same as in the table.

Language Models

Convolutional language models (ConvLM) are trained with the fairseq toolkit. n-gram language models are trained with the KenLM toolkit. The below language models are converted into a binary format compatible with the wav2letter++ decoder.

Name	Dataset	Type	Vocab
lm_librispeech_convlm_14B	LibriSpeech	ConvLM 14B	LM Vocab
lm_librispeech_kenlm_4g	LibriSpeech	4-gram	-

To reproduce decoding step from the paper download these models into $MODEL_DST/decoder/.