examples/tensorflow/bert/bert-quantization/README.md
Based on link
Original README: link
This directory contains examples for BERT PTQ/QAT related training.
Hardware settings:
The docker nvcr.io/nvidia/tensorflow:20.03-tf1-py3 is used for test (TensorFlow 1.15.2)
setup steps:
pip install ft-tensorflow-quantization/
export TF_ENABLE_AUTO_MIXED_PRECISION_GRAPH_REWRITE=0
Download pretrained bert checkpoint.
wget https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-12_H-768_A-12.zip -O uncased_L-12_H-768_A-12.zip
unzip uncased_L-12_H-768_A-12.zip -d squad_model
Download SQuAD dataset
mkdir squad_data
wget -P squad_data https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
wget -P squad_data https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
mpirun -np 4 -H localhost:4 \
--allow-run-as-root -bind-to none -map-by slot \
-x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH \
-x PATH -mca pml ob1 -mca btl ^openib \
python run_squad.py \
--bert_config_file=squad_model/bert_config.json \
--vocab_file=squad_model/vocab.txt \
--train_file=squad_data/train-v1.1.json \
--predict_file=squad_data/dev-v1.1.json \
--init_checkpoint=squad_model/bert_model.ckpt \
--output_dir=squad_model/finetuned_base \
--do_train=True \
--do_predict=True \
--if_quant=False \
--train_batch_size=8 \
--learning_rate=1e-5 \
--num_train_epochs=2.0 \
--save_checkpoints_steps 1000 \
--horovod
python ../../sample/tensorflow/tensorflow_bert/squad_evaluate_v1_1.py squad_data/dev-v1.1.json squad_model/finetuned_base/predictions.json
The results would be like:
{"exact_match": 82.44, "f1": 89.57}
quant_mode is unified with int8_mode in FasterTransformer, can be one of ft1 or ft2 or ft3.
python run_squad.py \
--bert_config_file=squad_model/bert_config.json \
--vocab_file=squad_model/vocab.txt \
--train_file=squad_data/train-v1.1.json \
--predict_file=squad_data/dev-v1.1.json \
--init_checkpoint=squad_model/finetuned_base/model.ckpt-5474 \
--output_dir=squad_model/PTQ_mode_2 \
--do_train=False \
--do_predict=True \
--do_calib=True \
--if_quant=True \
--train_batch_size=16 \
--calib_batch=16 \
--calib_method=percentile \
--percentile=99.999 \
--quant_mode=ft2
python ../../sample/tensorflow/tensorflow_bert/squad_evaluate_v1_1.py squad_data/dev-v1.1.json squad_model/PTQ_mode_2/predictions.json
The results would be like:
{"exact_match": 81.67, "f1": 88.94} # for mode 1
{"exact_match": 80.44, "f1": 88.30} # for mode 2
If PTQ does not yield an acceptable result you can finetune with quantization to recover accuracy. We recommend to calibrate the pretrained model and finetune to avoid overfitting:
quant_mode is unified with int8_mode in FasterTransformer, can be one of ft1 or ft2 or ft3.
python run_squad.py \
--bert_config_file=squad_model/bert_config.json \
--vocab_file=squad_model/vocab.txt \
--train_file=squad_data/train-v1.1.json \
--predict_file=squad_data/dev-v1.1.json \
--init_checkpoint=squad_model/bert_model.ckpt \
--output_dir=squad_model/QAT_calibrated_mode_2 \
--do_train=False \
--do_calib=True \
--train_batch_size=16 \
--calib_batch=16 \
--calib_method=percentile \
--percentile=99.99 \
--quant_mode=ft2
mpirun -np 4 -H localhost:4 \
--allow-run-as-root -bind-to none -map-by slot \
-x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH \
-x PATH -mca pml ob1 -mca btl ^openib \
python run_squad.py \
--bert_config_file=squad_model/bert_config.json \
--vocab_file=squad_model/vocab.txt \
--train_file=squad_data/train-v1.1.json \
--predict_file=squad_data/dev-v1.1.json \
--init_checkpoint=squad_model/QAT_calibrated_mode_2/model.ckpt-calibrated \
--output_dir=squad_model/QAT_mode_2 \
--do_train=True \
--do_predict=True \
--if_quant=True \
--train_batch_size=8 \
--learning_rate=1e-5 \
--num_train_epochs=2.0 \
--save_checkpoints_steps 1000 \
--quant_mode=ft2 \
--horovod
python ../../sample/tensorflow/tensorflow_bert/squad_evaluate_v1_1.py squad_data/dev-v1.1.json squad_model/QAT_mode_2/predictions.json
The results would be like:
{"exact_match": 82.11, "f1": 89.39} # for mode 1
{"exact_match": 81.74, "f1": 89.12} # for mode 2
The results of quantization may differ if different seeds are provided.
Knowledge-distillation can get better results, we usually starts from a PTQ checkpoint.
python run_squad.py \
--bert_config_file=squad_model/bert_config.json \
--vocab_file=squad_model/vocab.txt \
--train_file=squad_data/train-v1.1.json \
--predict_file=squad_data/dev-v1.1.json \
--init_checkpoint=squad_model/finetuned_base/model.ckpt-5474 \
--output_dir=squad_model/PTQ_mode_2_for_KD \
--do_train=False \
--do_predict=False \
--do_calib=True \
--if_quant=True \
--train_batch_size=16 \
--calib_batch=16 \
--calib_method=percentile \
--percentile=99.99 \
--quant_mode=ft2
mpirun -np 4 -H localhost:4 \
--allow-run-as-root -bind-to none -map-by slot \
-x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH \
-x PATH -mca pml ob1 -mca btl ^openib \
python run_squad.py \
--bert_config_file=squad_model/bert_config.json \
--vocab_file=squad_model/vocab.txt \
--train_file=squad_data/train-v1.1.json \
--predict_file=squad_data/dev-v1.1.json \
--init_checkpoint=squad_model/PTQ_mode_2_for_KD/model.ckpt-calibrated \
--output_dir=squad_model/QAT_KD_mode_2 \
--do_train=True \
--do_predict=True \
--if_quant=True \
--train_batch_size=8 \
--learning_rate=2e-5 \
--num_train_epochs=10.0 \
--save_checkpoints_steps 1000 \
--quant_mode=ft2 \
--horovod \
--distillation=True \
--teacher=squad_model/finetuned_base/model.ckpt-5474
python ../../sample/tensorflow/tensorflow_bert/squad_evaluate_v1_1.py squad_data/dev-v1.1.json squad_model/QAT_KD_mode_2/predictions.json
The results would be like:
{"exact_match": 84.06, "f1": 90.63} # for mode 1
{"exact_match": 84.02, "f1": 90.56} # for mode 2