tools/tensorflow-quantization/examples/inception/README.md
This script presents a QAT end-to-end workflow (TF2-to-ONNX) for Inception models in tf.keras.applications.
Requirements • Workflow • Results
Install base requirements and prepare data. Please refer to examples' README.
Similar to ResNet: different model and different input pre-processing.
Please run the following to quantize, fine-tune, and save the final graph in SavedModel format (checkpoints are also saved).
python run_qat_workflow.py
Step 1 already does the conversion from SavedModel to ONNX automatically. For manual steps, please see step 3 in EfficientNet's README.
Please refer to the examples' README.
Results obtained on NVIDIA's A100 GPU and TensorRT 8.4.2.4 (GA Update 1).
| Model | TF (%) | TF latency (ms, bs=1) | TRT(%) | TRT latency (ms, bs=1) |
|---|---|---|---|---|
| Baseline | 77.86 | 9.01 | 77.86 | 1.39 |
| PTQ | - | - | 77.73 | 0.82 |
| QAT | 78.11 | 101.97 | 78.08 | 0.82 |
piecewise_sgd, lr_schedule=[(1.0, 1), (0.1, 2), (0.01, 7)] (default)bs=64, ep=10, lr=0.001, steps_per_epoch=500bs=64.