recipes/sota/2019/librivox/README.md
train*.cfgThe model is trained with total batch size 256 for approximately 55 epochs. Optimizer is SGD with momentum.
[...]/wav2letter/build/Train train --flagsfile train_am_resnet_ctc.cfg --minloglevel=0 --logtostderr=1
# after 40 epochs
[...]/wav2letter/build/Train continue [path/to/am/model.bin] --lr=0.03 --lrcrit=0.03 --minloglevel=0 --logtostderr=1
The model is trained with total batch size 256 for approximately 50 epochs. Optimizer is SGD with momentum.
[...]/wav2letter/build/Train train --flagsfile train_am_resnet_s2s.cfg --minloglevel=0 --logtostderr=1
# after 36 epochs
[...]/wav2letter/build/Train continue [path/to/am/model.bin] --lr=0.03 --lrcrit=0.03 --minloglevel=0 --logtostderr=1
The model is trained with total batch size 256 for approximately 30 epochs. Optimizer is SGD with momentum.
[...]/wav2letter/build/Train train --flagsfile train_am_tds_ctc.cfg --minloglevel=0 --logtostderr=1
# after 26 epochs
[...]/wav2letter/build/Train continue [path/to/am/model.bin] --lr=0.15 --minloglevel=0 --logtostderr=1
The model is trained with total batch size 256 for approximately 50 epochs. Optimizer is SGD with momentum.
[...]/wav2letter/build/Train train --flagsfile train_am_tds_s2s.cfg --minloglevel=0 --logtostderr=1
# after 37 epochs
[...]/wav2letter/build/Train continue [path/to/am/model.bin] --lr=0.025 --lrcrit=0.025 --minloglevel=0 --logtostderr=1
The model is trained with total batch size 320 for approximatively 23 epochs with Adagrad. There is a warmup stage: SpecAugment is activated only after warmup, and the learning rate is warmed up (linearly increased) over the first 30000 updates to 0.02. It is then divided by 2 at epoch 5, and then every 4 epochs.
[...]/wav2letter/build/Train train --flagsfile train_am_transformer_ctc.cfg --minloglevel=0 --logtostderr=1
# last 12 epochs are done with
[...]/wav2letter/build/Train continue [path/to/am/model.bin] --maxisz=33000 --minloglevel=0 --logtostderr=1
The model is trained with total batch size 320 for approximatively 22 epochs with Adagrad and SGD finetuning. There is a warmup stage: SpecAugment is activated only after warmup, and the learning rate is warmed up (linearly increased) over the first 40000 updates to 0.02. It is then divided by 2 at epoch 4, and then every 3 epochs.
[...]/wav2letter/build/Train train --flagsfile train_am_transformer_s2s.cfg --minloglevel=0 --logtostderr=1
# after 9 epochs we continue with
[...]/wav2letter/build/Train continue [path/to/am/model.bin] --maxisz=33000 --minloglevel=0 --logtostderr=1
# after 15 epochs we finetune with SGD
[...]/wav2letter/build/Train continue [path/to/am/model.bin] --minloglevel=0 --logtostderr=1 --lr=0.01 --lrcrit=0.01 --lr_decay=5 --lr_decay_step=5 --netoptim=sgd --critoptim=sgd --warmup=0 --pretrainWindow=0 --enable_distributed
[path/to/am/model.bin], [MODEL_DST], [DATA_DST]), for example for test-other[...]/wav2letter/build/Test \
--am=[path/to/am/model.bin] \
--tokensdir=[MODEL_DST]/am \
--tokens=librispeech-train-all-unigram-10000.tokens \
--lexicon=[MODEL_DST]/am/librispeech-train+dev-unigram-10000-nbest10.lexicon \
--uselexicon=false \
--datadir=[DATA_DST]/lists \
--test=test-other.lst \
--minloglevel=0 --logtostderr=1 \
--maxtsz=1000000000 --maxisz=1000000000 --minisz=0 --mintsz=0 \
--emission_dir=''
decode*.cfgdecode*.cfg[...]/wav2letter/build/Decoder --flagsfile=path/to/necessary/decode/config --minloglevel=0 --logtostderr=1 --maxtsz=1000000000 --maxisz=1000000000 --minisz=0 --mintsz=0 --emission_dir=''