Back to Wav2letter

Beam candidates dumping and rescoring

recipes/sota/2019/rescoring/README.md

0.24.8 KB
Original Source

Beam candidates dumping and rescoring

Here we describe how to dump the beam candidates and perform rescoring with GCNN and Transformer LMs we did in the paper.

Dependencies

  • fairseq, commit d80ad54: put the forward_lm.py into fairseq directory (otherwise doesn't work with fairseq import)

Beam candidates dumping

  • Fix the paths inside decode*.cfg
  • Run beam dumping with decode*.cfg for each model:
    • CTC Librispeech Transformer, ngram and GCNN decoding
    • S2S Librispeech Transformer, ngram and GCNN decoding
    • CTC Librivox Transformer, ngram and GCNN decoding
    • S2S Librivox Transformer, ngram and GCNN decoding
[...]/wav2letter/build/Decoder --flagsfile path/to/necessary/decode/other/config --minloglevel=0 --logtostderr=1 --emission_dir='' --test=dev-other.lst
[...]/wav2letter/build/Decoder --flagsfile path/to/necessary/decode/other/config --minloglevel=0 --logtostderr=1 --emission_dir='' --test=test-other.lst
[...]/wav2letter/build/Decoder --flagsfile path/to/necessary/decode/clean/config --minloglevel=0 --logtostderr=1 --emission_dir='' --test=dev-clean.lst
[...]/wav2letter/build/Decoder --flagsfile path/to/necessary/decode/clean/config --minloglevel=0 --logtostderr=1 --emission_dir='' --test=test-clean.lst

Also run specific diverse beam dump for s2s Librispeech with gcnn decoding (the best one) to have diverse beam (optimization of beam search decoder is done with --beamsize=50 --beamthreshold==10 --beamsizetoken=10):

  • please copy files from src/* to the wav2letter/src/libraries/decoder to overwrite Lexicon-free s2s decoder algorithm
  • rebuild Decode.cpp with this updates
  • run the following beam dump
[...]/wav2letter/build/Decoder --flagsfile decode_transformer_s2s_gcnn_other_ls_completed_hyps.cfg --minloglevel=0 --logtostderr=1 --emission_dir='' --test=dev-other.lst
[...]/wav2letter/build/Decoder --flagsfile decode_transformer_s2s_gcnn_other_ls_completed_hyps.cfg --minloglevel=0 --logtostderr=1 --emission_dir='' --test=test_other.lst

Generate perplexity for each candidate in the beam

We use word-based GCNN and word-based Transformer to rescore, so at first we generate their perplexities (actually it is loss for the sentecnce) for each candidate in the beam

cd [FAIRSEQ]
# convlm
python forward_lm.py --model [MODEL_PATH]/checkpoint_best.pt --dict [MODEL_DATA_PATH]/dict.txt --text [BEAM DUMP].lst.hyp --out convlm.ppl --model-type convlm --max-tokens 1024 --skip 1
# transformer
python forward_lm.py --model [MODEL_PATH]/checkpoint_best.pt --dict [MODEL_DATA_PATH]/dict.txt --text [BEAM DUMP].lst.hyp --out transformer.ppl --model-type transformer --max-tokens 1024 --skip 1

Rescoring

  • running random search for S2S models
# for name in dev-other dev-clean
python rescore.py --hyp "[PATH_TO_$name].hyp" --list "[DATA_DST]/lists/$name.lst" --convlm="[PATH]/convlm_$name.ppl" --tr="[PATH]/transformer_$name.ppl" --in_wts=0,0,0 --search
# then eval found best weight for test-other and test-clean
python rescore.py --hyp "[PATH_TO_$name].hyp" --list "[DATA_DST]/lists/$name.lst" --convlm="[PATH]/convlm_$name.ppl" --tr="[PATH]/transformer_$name.ppl" --in_wts=w1,w2,w3
  • running grid search for CTC models (for ngram LM add also --top=large to eval 2500 beam)
# for name in dev-other dev-clean
python rescore.py --hyp "[PATH_TO_$name].hyp" --list "[DATA_DST]/lists/$name.lst" --convlm="[PATH]/convlm_$name.ppl" --tr="[PATH]/transformer_$name.ppl" --in_wts=0,0,0 --search --gridsearch
# then eval found best weight for test-other and test-clean
python rescore.py --hyp "[PATH_TO_$name].hyp" --list "[DATA_DST]/lists/$name.lst" --convlm="[PATH]/convlm_$name.ppl" --tr="[PATH]/transformer_$name.ppl" --in_wts=w1,w2,w3

Optimal weights of rescoring for Librispeech models (tr LM, GCNN lm, transcritpion len):

  • CTC ngram
    • clean 0.4,0.2,0.1
    • other 0.8,0,0.5
  • CTC gcnn
    • clean 0.5,0.1,0.2
    • other 0.8,0,0.5
  • S2S ngram
    • clean 0.24980175230211288,0.04919538965148296,0.27051900934773476 (top-3)
    • other 0.43353711959171454,0.02376409689162373,0.36902974241819764 (top-40)
  • S2S gcnn
    • clean 0.44292736054463794,0,0.5589561526817741 (top-3)
    • other 0.8121556103144534,-0.13482344019156423,0.5317810935135496 (top-6)
    • other (with completed hyps) 0.8530679672776722,-0.351434501804079,0.5151134574867755 (top-10)

Optimal weights of rescoring for Librivox models (tr LM, GCNN lm, transcritpion len):

  • CTC ngram
    • clean 0.5,0,0.4
    • other 0.4,0,0.3
  • CTC gcnn
    • clean 0.4,0,0.2
    • other 0.4,0,0.3
  • S2S ngram
    • clean 0.24980175230211288,0.04919538965148296,0.27051900934773476 (top-3)
    • other 0.510874298897509,-0.1756293581052848,0.39350304055842433 (top-50)
  • S2S gcnn
    • clean 0.4769702057546221,-0.26001972366117654,0.3727554580971921 (top-250)
    • other 0.459099594595537,0.038747686960546535,0.33754803186788784 (top-9)