tensorflow/lite/tools/optimize/debugging/README.md
[TOC]
When a quantized model is produced, it requires tedious and manual custom code to debug the model in order to:
This is now feasible using the TensorFlow Lite Quantization Debugger, as shown below.
Note: Currently, this workflow is only supported for full integer (int8) quantization. The debug model produced using this workflow should only be used for debugging purposes only (and not for inference).
Modify the TFLite full integer (int8) quantization steps as shown below to produce a debug model (used for debugging purposes only, and not inference)
With the help of the MLIR quantizer's debug mode feature, the debug model
produced has both the original float operators (or ops) and the quantized ops.
Additionally, NumericVerify ops are added to compare the outputs of the
original float and quantized ops and to also collect statistics. It has the name
in the format of NumericVerify/{original tensor name}:{original tensor id}
# for mlir_quantize
from tensorflow.lite.python import convert
# set full-integer quantization parameters as usual.
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = calibration_gen
# Create a TFLite model with new quantizer and numeric verify ops. Rather than
# calling convert() only, calibrate model first and call `mlir_quantize` to run
# the actual quantization, with `enable_numeric_verify` set to `True`.
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter._experimental_calibrate_only = True
calibrated = converter.convert()
return convert.mlir_quantize(calibrated, enable_numeric_verify=True)
Initialize debugger with the debug model. This can be done in two ways.
from tensorflow.lite.tools.optimize.debugging.python import debugger
# `debug_dataset` accpets the same type as `converter.representative_dataset`.
quant_debugger = debugger.QuantizationDebugger(
quant_debug_model_content=quant_debug_model,
debug_dataset=data_gen)
# OR
quant_debugger = debugger.QuantizationDebugger(
quant_debug_model_path='/path/to/debug_model.tflite',
debug_dataset=data_gen)
quant_debugger.run()
When you call quant_debugger.run(), quant_debugger.layer_statistics is
filled with aggregated statistics for each NumericVerify ops. Some metrics
(i.e. stddev, mean square error) are calculated by default.
# `quant_debugger.layer_statistics.metrics` is defaultdict, convert it to dict
# for readable output.
import pprint
for layer_name, metrics in quant_debugger.layer_statistics.items():
print(layer_name)
pprint.pprint(dict(metrics))
# ...
NumericVerify/sequential/dense/MatMul;sequential/dense/BiasAdd3:77
{'max_abs_error': 0.05089309,
'mean_error': -0.00017149668,
'mean_squared_error': 0.00040816222,
'num_elements': 256.0,
'stddev': 0.02009948}
NumericVerify/sequential/dense_1/MatMul;sequential/dense_1/BiasAdd3:81
{'max_abs_error': 0.09744112,
'mean_error': 0.0048679365,
'mean_squared_error': 0.0036721828,
'num_elements': 10.0,
'stddev': 0.055745363}
NumericVerify/Identity2:85
{'max_abs_error': 0.0036417267,
'mean_error': -0.00068773015,
'mean_squared_error': 3.439951e-06,
'num_elements': 10.0,
'stddev': 0.0016223773}
# ...
More metrics can be added by passing QuantizationDebugOptions to the
initializer. For example, if you want to add mean absolute error, use following
snippet.
debug_options = debugger.QuantizationDebugOptions(
layer_debug_metrics={
'mean_abs_error': lambda diffs: np.mean(np.abs(diffs))
})
quant_debugger = debugger.QuantizationDebugger(
quant_debug_model_content=quant_debug_model,
debug_dataset=data_gen,
debug_options=debug_options
)
quant_debugger.run()
Now quant_debugger.layer_statistics includes mean absoulte error for each
layer.
In addition to single model analysis, the output of original float model and
quantized model can be compared when both models are given. This can be done
by providing a float model, and metrics to compare outputs. This can be argmax
for classification models, bit for more complex models like detection more
complicated logic should be given.
# functions for model_debug_metrics gets all output tensors from float and
# quantized models, and returns a single metric value.
debug_options = debugger.QuantizationDebugOptions(
model_debug_metrics={
'argmax_accuracy': lambda f, q: np.argmax(f[0]) == np.argmax(q[0])
})
float_model = converter.convert() # converted without any optimizations.
quant_debugger = debugger.QuantizationDebugger(
quant_debug_model_content=quant_debug_model,
float_model_content=float_model, # can pass `float_model_path` instead.
debug_dataset=data_gen,
debug_options=debug_options
)
quant_debugger.run()
The result is a single number per metric, so it's easier to inspect.
>>> quant_debugger.model_statistics
{'argmax_accuracy': 0.89}
quant_debugger.layer_statistics_dump function accepts file-like object, and
exports layer statistics to csv. This can be imported to other tools like
pandas for further processing. The exported data also has name of the op,
originating tensor ID, and quantization parameters (scales and zero points) for
quantized layer.
Note: scales and zero points are lists, and imported to pandas as text by
default. Additional processing to parse them is required before processing.
import pandas as pd
import yaml # used to parse lists
with open('/path/to/stats.csv', 'w') as f:
quant_debugger.layer_statistics_dump(f)
data = pd.read_csv(
'/path/to/stats.csv',
converters={
'scales': yaml.safe_load,
'zero_points': yaml.safe_load
})