tools/quantization/README_EN.md
Quantization can accelerate forward speed of the model by converting floating point computations in the original model into int8 computations. At the same time, it compresses the original model by approximately 4X by quantize the float32 weights into int8 weights.
In order to build the quantization tool, set MNN_BUILD_QUANTOOLS=true when compiling, like this:
cd MNN
mkdir build
cd build
cmake -DMNN_BUILD_QUANTOOLS=ON ..
make -j4
./quantized.out origin.mnn quan.mnn preprocessConfig.json
The first argument is the path of floating point model to be quantized
The second argument indicates the saving path of quantized model
The third argument is the path of config json file
{
"format":"RGB",
"mean":[
127.5,
127.5,
127.5
],
"normal":[
0.00784314,
0.00784314,
0.00784314
],
"width":224,
"height":224,
"path":"path/to/images/",
"used_image_num":500,
"feature_quantize_method":"KL",
"weight_quantize_method":"MAX_ABS"
}
The format of input images is RGBA, then converted to target format specified by format.
Options: "RGB", "BGR", "RGBA", "GRAY"
The same as ImageProcess config
$dst = (src - mean) * normal$
Input width and height of the floating point model
Path to images that are used for calibrating feature quantization scale factors.
Specify the number of images used for calibration.
Default: use all the images under
path.
Note: please confirm that the data after the images are transformed by the above processes are the exact data that fed into the model input.
Specify method used to compute feature quantization scale factor.
Options:
"KL": use KL divergence method, generally need 100 ~ 1000 images.
"ADMM": use ADMM (Alternating Direction Method of Multipliers) method to iteratively search for optimal feature quantization scale factors, generally need one batch images.
Default: "KL"
Specify weight quantization method
Options:
"MAX_ABS": use the max absolute value of weights to do symmetrical quantization.
"ADMM": use ADMM method to iteratively find optimal quantization of weights.
Default: "MAX_ABS"
Users can explore the above feature and weight quantization methods, and choose a better solution.
The same as floating point model. The inputs and outputs of quantized model are also floating point.