recipes/seq2seq_tds/librispeech/README.md
Check out the following commits:
Run data and auxiliary files (like lexicon, tokens set, etc.) preparation (set necessary paths instead of [...]: data_dst path to data to store, model_dst path to auxiliary path to store).
pip install sentencepiece==0.1.82
python3 prepare.py --data_dst [...] --model_dst [...]
Besides data the auxiliary files for acoustic and language models training/evaluation will be generated:
cd $MODEL_DST
tree -L 2
.
├── am
│ ├── librispeech-train+dev-unigram-10000-nbest10.lexicon
│ └── librispeech-train-all-unigram-10000.tokens
└── decoder
To run training/decoding:
*.cfgtrain.cfg
The parameters and settings in train.cfg are for running experiments on a single node with 8 GPUs (--enable_distributed=true). Distributed jobs can be launched using Open MPI.decode*.cfgBelow there is info about pre-trained acoustic and language models, which one can use, for example, to reproduce a decoding step.
| File | Dataset | Dev Set | Architecture | Lexicon | Tokens |
|---|---|---|---|---|---|
| baseline_dev-clean | LibriSpeech | dev-clean | Archfile | Lexicon | Tokens |
| baseline_dev-other | LibriSpeech | dev-other | Archfile | Lexicon | Tokens |
Here architecture files are the same as network.arch, tokens and lexicon files generated in the $MODEL_DST/am/ are the same as in the table.
Convolutional language models (ConvLM) are trained with the fairseq toolkit. n-gram language models are trained with the KenLM toolkit. The below language models are converted into a binary format compatible with the wav2letter++ decoder.
| Name | Dataset | Type | Vocab |
|---|---|---|---|
| lm_librispeech_convlm_14B | LibriSpeech | ConvLM 14B | LM Vocab |
| lm_librispeech_kenlm_4g | LibriSpeech | 4-gram | - |
To reproduce decoding step from the paper download these models into $MODEL_DST/decoder/.