Back to Fairseq

End-to-end NLU

examples/audio_nlp/nlu/README.md

0.12.33.7 KB
Original Source

End-to-end NLU

End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device.

This page releases the code for reproducing the results in STOP: A dataset for Spoken Task Oriented Semantic Parsing

The dataset can be downloaded here: download link

The low-resource splits can be downloaded here: download link

Pretrained models end-to-end NLU Models

Speech PretrainingASR PretrainingTest EM AccuracyTesst EM-Tree AccuracyLink
NoneNone36.5457.01link
Wav2VecNone68.0582.53link
HuBERTNone68.4082.85link
Wav2VecSTOP68.7082.78link
HuBERTSTOP69.2382.87link
Wav2VecLibrispeech68.4782.49link
HuBERTLibrispeech68.7082.78link

Pretrained models ASR Models

Speech Pre-trainingASR DatasetSTOP Eval WERSTOP Test WERdev_other WERdev_clean WERtest_clean WERtest_other WERLink
HuBERTLibrispeech8.472.993.258.0625.6826.19link
Wav2VecLibrispeech9.2153.2043.3349.00627.25727.588link
HuBERTSTOP46.3131.3031.5247.164.294.26link
Wav2VecSTOP43.10327.83328.47928.4794.6794.667link
HuBERTLibrispeech + STOP9.0153.2113.3728.6355.1335.056link
Wav2VecLibrispeech + STOP9.5493.5373.6259.5145.595.562link

Creating the fairseq datasets from STOP

First, create the audio file manifests and label files:

python examples/audio_nlp/nlu/generate_manifests.py --stop_root $STOP_DOWNLOAD_DIR/stop --output $FAIRSEQ_DATASET_OUTPUT/

Run ./examples/audio_nlp/nlu/create_dict_stop.sh $FAIRSEQ_DATASET_OUTPUT to generate the fairseq dictionaries.

Training an End-to-end NLU Model

Download a wav2vec or hubert model from link or link

python fairseq_cli/hydra-train  --config-dir examples/audio_nlp/nlu/configs/  --config-name nlu_finetuning task.data=$FAIRSEQ_DATA_OUTPUT model.w2v_path=$PRETRAINED_MODEL_PATH