tensorflow/lite/g3doc/guide/index.md
TensorFlow Lite is a set of tools that enables on-device machine learning by helping developers run their models on mobile, embedded, and edge devices.
Key Point: The TensorFlow Lite binary is ~1MB when all 125+ supported operators are linked (for 32-bit ARM builds), and less than 300KB when using only the operators needed for supporting the common image classification models InceptionV3 and MobileNet.
The following guide walks through each step of the workflow and provides links to further instructions:
Note: Refer to the performance best practices guide for an ideal balance of performance, model size, and accuracy.
A TensorFlow Lite model is represented in a special efficient portable format known as FlatBuffers{:.external} (identified by the .tflite file extension). This provides several advantages over TensorFlow's protocol buffer model format such as reduced size (small code footprint) and faster inference (data is directly accessed without an extra parsing/unpacking step) that enables TensorFlow Lite to execute efficiently on devices with limited compute and memory resources.
A TensorFlow Lite model can optionally include metadata that has human-readable model description and machine-readable data for automatic generation of pre- and post-processing pipelines during on-device inference. Refer to Add metadata for more details.
You can generate a TensorFlow Lite model in the following ways:
Use an existing TensorFlow Lite model: Refer to TensorFlow Lite Examples to pick an existing model. Models may or may not contain metadata.
Create a TensorFlow Lite model: Use the TensorFlow Lite Model Maker to create a model with your own custom dataset. By default, all models contain metadata.
Convert a TensorFlow model into a TensorFlow Lite model: Use the TensorFlow Lite Converter to convert a TensorFlow model into a TensorFlow Lite model. During conversion, you can apply optimizations such as quantization to reduce model size and latency with minimal or no loss in accuracy. By default, all models don't contain metadata.
Inference refers to the process of executing a TensorFlow Lite model on-device to make predictions based on input data. You can run inference in the following ways based on the model type:
Models without metadata: Use the TensorFlow Lite Interpreter API. Supported on multiple platforms and languages such as Java, Swift, C++, Objective-C and Python.
Models with metadata: You can either leverage the out-of-box APIs using the TensorFlow Lite Task Library or build custom inference pipelines with the TensorFlow Lite Support Library. On android devices, users can automatically generate code wrappers using the Android Studio ML Model Binding or the TensorFlow Lite Code Generator. Supported only on Java (Android) while Swift (iOS) and C++ is work in progress.
On Android and iOS devices, you can improve performance using hardware acceleration. On either platforms you can use a GPU Delegate, on android you can either use the NNAPI Delegate (for newer devices) or the Hexagon Delegate (on older devices) and on iOS you can use the Core ML Delegate. To add support for new hardware accelerators, you can define your own delegate.
You can refer to the following guides based on your target device:
Android and iOS: Explore the Android quickstart and iOS quickstart.
Embedded Linux: Explore the Python quickstart for embedded devices such as Raspberry Pi{:.external} and Coral devices with Edge TPU{:.external}, or C++ build instructions for ARM.
Microcontrollers: Explore the TensorFlow Lite for Microcontrollers library for microcontrollers and DSPs that contain only a few kilobytes of memory.
All TensorFlow models cannot be converted into TensorFlow Lite models, refer to Operator compatibility.
Unsupported on-device training, however it is on our Roadmap.