tools/tensorflow-quantization/README.md
Note: Tensorflow Quantization development has transitioned to the TensorRT Model Optimizer. All developers are encouraged to use the TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression. While the Tensorflow Quantization code will remain available, it will no longer receive further development.
This TensorFlow 2.x Quantization toolkit quantizes (inserts Q/DQ nodes) TensorFlow 2.x Keras models for Quantization-Aware Training (QAT). We follow NVIDIA's QAT recipe, which leads to optimal model acceleration with TensorRT on NVIDIA GPUs and hardware accelerators.
Python >= 3.8
TensorFlow >= 2.8
tf2onnx >= 1.10.1
onnx-graphsurgeon
pytest
pytest-html
TensorRT (optional) >= 8.4 GA
Latest TensorFlow 2.x docker image from NGC is recommended.
$ cd ~/
$ git clone https://github.com/NVIDIA/TensorRT.git
$ docker pull nvcr.io/nvidia/tensorflow:22.03-tf2-py3
$ docker run -it --runtime=nvidia --gpus all --net host -v ~/TensorRT/tools/tensorflow-quantization:/home/tensorflow-quantization nvcr.io/nvidia/tensorflow:22.03-tf2-py3 /bin/bash
After last command, you will be placed in /workspace directory inside the running docker container whereas tensorflow-quantization repo is mounted in /home directory.
$ cd /home/tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP
If all tests pass, installation is successful.
$ cd ~/
$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP
If all tests pass, installation is successful.
TensorFlow 2.x Quantization toolkit user guide.
TensorFlow < 2.8:
DepthwiseConv2D support was added in TF 2.8.Conv2DTranspose is not yet supported by TF (see the open bug here).
However, there's a workaround if you do not need the TF2 SavedModel file and just the ONNX file:
Conv2DTransposeQuantizeWrapper. See our user guide for more information on how to do that.convert_keras_model_to_onnx.