docs/en/integrations/litert.md
LiteRT (short for Lite Runtime) is Google's high-performance runtime for on-device AI. It is the next generation and the new name for TensorFlow Lite (TFLite), and it runs the same .tflite model format. With LiteRT, a single exported Ultralytics YOLO model deploys across mobile, embedded, edge, and the browser — covering everything that the older tflite and tfjs export formats handled separately, now under one umbrella.
The LiteRT export format optimizes your models for tasks like object detection, segmentation, pose estimation, and classification so they run fast and offline on a wide range of devices.
LiteRT is an open-source framework designed for on-device inference, also known as edge computing. It gives developers the tools to execute trained models on mobile, embedded, and IoT devices, traditional computers, and — through LiteRT.js — directly in web browsers and Node.js.
One model format, every target:
.tflite model on the web with WebGPU/WASM acceleration — replacing the need for a separate TensorFlow.js export.quantize=8, int8 weights + int8 activations), static INT16-activation (quantize="w8a16", int8 weights + int16 activations for higher accuracy), and dynamic INT8 (quantize="w8a32", int8 weights + FP32 activations, no calibration data needed) to compress models and speed up inference with minimal accuracy loss.You can improve on-device execution efficiency and broaden deployment options by converting your models to the LiteRT format.
To install the required package, run:
!!! tip "Installation"
=== "CLI"
```bash
# Install the required package for YOLO
pip install ultralytics
```
For detailed instructions and best practices, check our Ultralytics Installation guide. If you encounter any difficulties, consult our Common Issues guide.
!!! note "Platform support"
LiteRT **export** is currently supported on **Linux x86_64** and **macOS**. The exported `.tflite` model itself runs on all LiteRT-supported platforms (mobile, embedded, edge, and the browser).
All Ultralytics YOLO models support export out of the box. The LiteRT format supports the Export, Predict, and Validate modes, so you can export a model, then load it to run inference or validate its accuracy locally.
!!! example "Export"
=== "Python"
```python
from ultralytics import YOLO
# Load a YOLO26 model
model = YOLO("yolo26n.pt")
# Export the model to LiteRT format
model.export(format="litert") # creates 'yolo26n.tflite'
```
=== "CLI"
```bash
# Export a YOLO26n PyTorch model to LiteRT format
yolo export model=yolo26n.pt format=litert # creates 'yolo26n.tflite'
```
!!! example "Quantized export"
=== "Python"
```python
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
# Dynamic INT8: int8 weights, FP32 activations - no calibration data needed
model.export(format="litert", quantize="w8a32") # creates 'yolo26n_w8a32.tflite'
# Static INT8: int8 weights + int8 activations - needs calibration data
model.export(format="litert", quantize=8, data="coco8.yaml") # creates 'yolo26n_int8.tflite'
# Static w8a16: int8 weights + int16 activations (higher accuracy) - needs calibration data
model.export(format="litert", quantize="w8a16", data="coco8.yaml") # creates 'yolo26n_w8a16.tflite'
```
=== "CLI"
```bash
# Dynamic INT8 (no calibration data needed)
yolo export model=yolo26n.pt format=litert quantize=w8a32
# Static INT8 (needs calibration data)
yolo export model=yolo26n.pt format=litert quantize=8 data=coco8.yaml
# Static w8a16: int8 weights + int16 activations (needs calibration data)
yolo export model=yolo26n.pt format=litert quantize=w8a16 data=coco8.yaml
```
!!! example "Predict"
=== "Python"
```python
from ultralytics import YOLO
# Load the exported LiteRT model
model = YOLO("yolo26n.tflite")
# Run inference
results = model("https://ultralytics.com/images/bus.jpg")
```
=== "CLI"
```bash
# Run inference with the exported LiteRT model
yolo predict model=yolo26n.tflite source='https://ultralytics.com/images/bus.jpg'
```
!!! example "Validate"
=== "Python"
```python
from ultralytics import YOLO
# Load the exported LiteRT model
model = YOLO("yolo26n.tflite")
# Validate accuracy on the COCO8 dataset
metrics = model.val(data="coco8.yaml")
```
=== "CLI"
```bash
# Validate the exported LiteRT model
yolo val model=yolo26n.tflite data=coco8.yaml
```
| Argument | Type | Default | Description |
|---|---|---|---|
format | str | 'litert' | Target format for the exported model, defining compatibility with various deployment environments. |
imgsz | int or tuple | 640 | Desired image size for the model input. Can be an integer for square images or a tuple (height, width) for specific dimensions. |
quantize | int or str | None | Quantization precision: 8 (static INT8, int8 weights + int8 activations; needs calibration data/fraction), 'w8a16' (static, int8 weights + int16 activations; needs calibration data/fraction), 'w8a32' (dynamic INT8, int8 weights + FP32 activations; no calibration needed), or 32/unset (FP32). FP16 is not exported separately (see note below). Replaces the deprecated half/int8 flags. |
batch | int | 1 | Specifies export model batch inference size or the max number of images the exported model will process concurrently in predict mode. |
data | str | 'coco8.yaml' | Dataset YAML used for INT8 calibration. If omitted with quantize=8, Ultralytics selects the default calibration dataset for the model task. |
device | str | None | Specifies the device for exporting. LiteRT export runs on CPU (device=cpu). |
!!! note "FP16 precision"
Unlike the legacy `tflite` export, LiteRT does not require a separate FP16 export. An FP32 `.tflite` model runs in **half precision at runtime** when using a GPU delegate (WebGPU, OpenCL, Metal) — this is the official LiteRT approach to FP16 inference.
For more details about the export process, visit the Ultralytics documentation page on exporting.
After exporting your Ultralytics YOLO model to LiteRT, you can deploy it across platforms. The quickest way to verify it locally is the YOLO("yolo26n.tflite") method shown above. For deployment in other environments, see the following resources:
.tflite model directly in the browser with WebGPU/WASM acceleration, eliminating server-side computation and keeping data on the user's device.In this guide, we covered how to export Ultralytics YOLO models to the LiteRT format. By consolidating mobile/edge (formerly TFLite) and browser (formerly TF.js) deployment into a single .tflite model, LiteRT makes your YOLO models faster, smaller, and portable across virtually every on-device target.
For further details, visit the LiteRT official documentation.
Also, if you're curious about other Ultralytics YOLO integrations, check out our integration guide page for plenty of helpful resources.
Use the Ultralytics library to export a YOLO model to LiteRT (.tflite). First, install the package:
pip install ultralytics
Then export your model:
from ultralytics import YOLO
# Load a YOLO26 model
model = YOLO("yolo26n.pt")
# Export the model to LiteRT format
model.export(format="litert") # creates 'yolo26n.tflite'
For CLI users:
yolo export model=yolo26n.pt format=litert # creates 'yolo26n.tflite'
For more details, visit the Ultralytics export guide.
LiteRT is the new name for TensorFlow Lite — same .tflite model format, same runtime lineage, rebranded by Google. In Ultralytics, the single litert export format now covers both use cases that previously required two separate formats:
tflite format → mobile, embedded, and edge deployment.tfjs format → browser and Node.js deployment, now handled by LiteRT.js running the same .tflite file.If you have an existing .tflite file, you can load it directly with YOLO("model.tflite") and it will run through the LiteRT backend.
Yes. Export your model to LiteRT format, then run it on a Raspberry Pi to improve inference speeds. For further optimization, consider a Coral Edge TPU. For detailed steps, refer to our Raspberry Pi deployment guide.
Yes. LiteRT.js runs the same exported .tflite model directly in a web browser or Node.js application, with WebGPU/WASM acceleration. This replaces the previous TensorFlow.js workflow — there is no separate browser export, just deploy your LiteRT model with the LiteRT.js runtime.
Yes — at runtime. An FP32 LiteRT model automatically runs in FP16 when executed on a GPU delegate (WebGPU, OpenCL, or Metal), which is the official LiteRT approach. You therefore don't need a dedicated FP16 export; for further compression, use INT8 quantization with quantize=8.
If you encounter errors while exporting YOLO models to LiteRT, common solutions include:
data parameter.For additional troubleshooting tips, visit our Common Issues guide.