Quantization Aware Training for NLP Models

Description

This project includes quantization aware training (QAT) code for NLP models. These are examples to show how to apply the Model Optimization Toolkit's quantization aware training API. Compared to post-training quantization (PTQ), QAT can minimize the quality loss from quantization, while still achieving the speed-up from integer quantization.

Currently, we support a limited number of NLP tasks & models. We will keep adding support for other tasks and models in the next releases.

Maintainers

Jaehong Kim (Xhark)
Rino Lee (rino20)

Requirements

Results

MobileBERT

Model name	SQUAD F1 (float)	SQUAD F1 (PTQ)	SQUAD F1 (QAT)	download	links
MobileBERT-EdgeTPU-XS	88.02%	84.96%	85.42%	FP32 \| INT8 \| QAT INT8 (ckpt)	tensorboard

Please follow MobileBERT QAT Tutorial Colab notebook to try exported models.

Training

It can run on Google Cloud Platform using Cloud TPU. Here is the instruction of using Cloud TPU. Follow the below instructions to set up Cloud TPU and launch training, using mobilebert as an exmaple:

shell


# First, Download the pre-trained floating point model as QAT needs to finetune it.
gsutil cp gs://tf_model_garden/nlp/qat/mobilebert/mobilebert_fp32_ckpt.tar.gz /tmp/qat/

# Extract the checkpoint.
tar -xvzf /tmp/qat/mobilebert_fp32_ckpt.tar.gz

# Convert the float checkpoint to QAT checkpoint.
$ python3 pretrained_checkpoint_converter.py \
  --experiment=bert/squad \
  --config_file=<fill in>/edgetpu/nlp/experiments/downstream_tasks/mobilebert_edgetpu_xs.yaml \
  --config_file=<fill in>/edgetpu/nlp/experiments/downstream_tasks/squad_v1.yaml \
  --experiment_qat=bert/squad_qat \
  --config_file_qat=<fill in>/edgetpu/nlp/experiments/downstream_tasks/mobilebert_edgetpu_xs.yaml \
  --config_file_qat=<fill in>/qat/nlp/configs/experiments/squad_v1_mobilebert_xs_qat_1gpu.yaml \
  --pretrained_checkpoint=<fill in> \  # Example: /tmp/qat/mobilebert_fp32_ckpt
  --output_checkpoint=<fill in>  # Example: /tmp/qat/mobilebert_fp32_ckpt_qat

# Launch training. Note that we override the checkpoint path in the config file by "params_override" to supply the correct checkpoint.
PARAMS_OVERRIDE="task.quantization.pretrained_original_checkpoint=/tmp/qat/mobilebert_fp32_ckpt_qat"
EXPERIMENT=bert/squad_qat  # Experiment type according to the subtask. Example: 'bert/squad_qat'
TPU_NAME="<tpu-name>"  # The name assigned while creating a Cloud TPU.
MODEL_DIR="gs://<path-to-model-directory>"  # Model artifacts directory for the training.
$ python3 train.py \
  --experiment=${EXPERIMENT} \
  --config_file_qat=<fill in>/edgetpu/nlp/experiments/downstream_tasks/mobilebert_edgetpu_xs.yaml \
  --config_file_qat=<fill in>/qat/nlp/configs/experiments/squad_v1_mobilebert_xs_qat_1gpu.yaml \
  --model_dir=${MODEL_DIR} \
  --tpu=$TPU_NAME \
  --params_override=${PARAMS_OVERRIDE}
  --mode=train

Evaluation

Please Run below command for evaluation.

shell

EXPERIMENT=bert/squad_qat  # Experiment type according to the subtask. Example: 'bert/squad_qat'
TPU_NAME="<tpu-name>"  # The name assigned while creating a Cloud TPU.
MODEL_DIR="gs://<path-to-model-directory>"  # Model artifacts directory for the training.
$ python3 train.py \
  --experiment=${EXPERIMENT} \
  --config_file_qat=<fill in>/edgetpu/nlp/experiments/downstream_tasks/mobilebert_edgetpu_xs.yaml \
  --config_file_qat=<fill in>/qat/nlp/configs/experiments/squad_v1_mobilebert_xs_qat_1gpu.yaml \
  --model_dir=${MODEL_DIR} \
  --tpu=$TPU_NAME \
  --mode=eval

License

This project is licensed under the terms of the Apache License 2.0.