tensorflow/lite/g3doc/guide/faq.md
If you don't find an answer to your question here, please look through our detailed documentation for the topic or file a GitHub issue.
The supported formats are listed here
In order to keep TFLite lightweight, only certain TF operators (listed in the allowlist) are supported in TFLite.
Since the number of TensorFlow Lite operations is smaller than TensorFlow's, some models may not be able to convert. Some common errors are listed here.
For conversion issues not related to missing operations or control flow ops, search our GitHub issues or file a new one.
The best way to test is to compare the outputs of the TensorFlow and the TensorFlow Lite models for the same inputs (test data or random inputs) as shown here.
The easiest way to inspect a graph from a .pb file is to use
Netron, an open-source viewer for
machine learning models.
If Netron cannot open the graph, you can try the summarize_graph tool.
If the summarize_graph tool yields an error, you can visualize the GraphDef with
TensorBoard and
look for the inputs and outputs in the graph. To visualize a .pb file, use the
import_pb_to_tensorboard.py
script like below:
python import_pb_to_tensorboard.py --model_dir <model path> --log_dir <log dir path>
.tflite file?Netron is the easiest way to visualize a TensorFlow Lite model.
If Netron cannot open your TensorFlow Lite model, you can try the visualize.py script in our repository.
If you're using TF 2.5 or a later version
python -m tensorflow.lite.tools.visualize model.tflite visualized_model.html
Otherwise, you can run this script with Bazel
visualize.py script with bazel:bazel run //tensorflow/lite/tools:visualize model.tflite visualized_model.html
Post-training quantization can be used during conversion to TensorFlow Lite to reduce the size of the model. Post-training quantization quantizes weights to 8-bits of precision from floating-point and dequantizes them during runtime to perform floating point computations. However, note that this could have some accuracy implications.
If retraining the model is an option, consider Quantization-aware training. However, note that quantization-aware training is only available for a subset of convolutional neural network architectures.
For a deeper understanding of different optimization methods, look at Model optimization.
The high-level process to optimize TensorFlow Lite performance looks something like this:
SetNumThreads() in the
C++ API
to do this. However, increasing threads results in performance variability
depending on the environment.For a more in-depth discussion on how to optimize performance, take a look at Best Practices.