Back to Ultralytics

Export YOLO to LiteRT (TFLite) for Edge and Web Deployment

docs/en/integrations/litert.md

8.4.8315.5 KB
Original Source

Export YOLO Models to LiteRT for Edge and Web Deployment

<p align="center"> </p>

LiteRT (short for Lite Runtime) is Google's high-performance runtime for on-device AI. It is the next generation and the new name for TensorFlow Lite (TFLite), and it runs the same .tflite model format. With LiteRT, a single exported Ultralytics YOLO model deploys across mobile, embedded, edge, and the browser — covering everything that the older tflite and tfjs export formats handled separately, now under one umbrella.

The LiteRT export format optimizes your models for tasks like object detection, segmentation, pose estimation, and classification so they run fast and offline on a wide range of devices.

Why Should You Export to LiteRT?

LiteRT is an open-source framework designed for on-device inference, also known as edge computing. It gives developers the tools to execute trained models on mobile, embedded, and IoT devices, traditional computers, and — through LiteRT.js — directly in web browsers and Node.js.

One model format, every target:

  • Mobile & Embedded: Android, iOS, embedded Linux, and microcontrollers (MCUs).
  • Edge accelerators: Compatible with the Coral Edge TPU for further acceleration.
  • Browser & Node.js: LiteRT.js runs the same .tflite model on the web with WebGPU/WASM acceleration — replacing the need for a separate TensorFlow.js export.

Key Features of LiteRT Models

  • On-device Optimization: Reduces latency by processing data locally, enhances privacy by not transmitting personal data, and minimizes model size to save space.
  • Multiple Platform Support: Runs on Android, iOS, embedded Linux, microcontrollers, and modern web browsers.
  • Hardware Acceleration: Leverages XNNPACK on CPU, and GPU acceleration via OpenCL, Metal, and WebGPU. The GPU delegate runs in FP16 by default for additional speed.
  • Quantization: Supports FP32, static INT8 (quantize=8, int8 weights + int8 activations), static INT16-activation (quantize="w8a16", int8 weights + int16 activations for higher accuracy), and dynamic INT8 (quantize="w8a32", int8 weights + FP32 activations, no calibration data needed) to compress models and speed up inference with minimal accuracy loss.
  • Diverse Language Support: Compatible with Java/Kotlin, Swift, Objective-C, C++, Python, and JavaScript.

Export to LiteRT: Converting Your YOLO Model

You can improve on-device execution efficiency and broaden deployment options by converting your models to the LiteRT format.

Installation

To install the required package, run:

!!! tip "Installation"

=== "CLI"

    ```bash
    # Install the required package for YOLO
    pip install ultralytics
    ```

For detailed instructions and best practices, check our Ultralytics Installation guide. If you encounter any difficulties, consult our Common Issues guide.

!!! note "Platform support"

LiteRT **export** is currently supported on **Linux x86_64** and **macOS**. The exported `.tflite` model itself runs on all LiteRT-supported platforms (mobile, embedded, edge, and the browser).

Usage

All Ultralytics YOLO models support export out of the box. The LiteRT format supports the Export, Predict, and Validate modes, so you can export a model, then load it to run inference or validate its accuracy locally.

!!! example "Export"

=== "Python"

    ```python
    from ultralytics import YOLO

    # Load a YOLO26 model
    model = YOLO("yolo26n.pt")

    # Export the model to LiteRT format
    model.export(format="litert")  # creates 'yolo26n.tflite'
    ```

=== "CLI"

    ```bash
    # Export a YOLO26n PyTorch model to LiteRT format
    yolo export model=yolo26n.pt format=litert # creates 'yolo26n.tflite'
    ```

!!! example "Quantized export"

=== "Python"

    ```python
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")

    # Dynamic INT8: int8 weights, FP32 activations - no calibration data needed
    model.export(format="litert", quantize="w8a32")  # creates 'yolo26n_w8a32.tflite'

    # Static INT8: int8 weights + int8 activations - needs calibration data
    model.export(format="litert", quantize=8, data="coco8.yaml")  # creates 'yolo26n_int8.tflite'

    # Static w8a16: int8 weights + int16 activations (higher accuracy) - needs calibration data
    model.export(format="litert", quantize="w8a16", data="coco8.yaml")  # creates 'yolo26n_w8a16.tflite'
    ```

=== "CLI"

    ```bash
    # Dynamic INT8 (no calibration data needed)
    yolo export model=yolo26n.pt format=litert quantize=w8a32

    # Static INT8 (needs calibration data)
    yolo export model=yolo26n.pt format=litert quantize=8 data=coco8.yaml

    # Static w8a16: int8 weights + int16 activations (needs calibration data)
    yolo export model=yolo26n.pt format=litert quantize=w8a16 data=coco8.yaml
    ```

!!! example "Predict"

=== "Python"

    ```python
    from ultralytics import YOLO

    # Load the exported LiteRT model
    model = YOLO("yolo26n.tflite")

    # Run inference
    results = model("https://ultralytics.com/images/bus.jpg")
    ```

=== "CLI"

    ```bash
    # Run inference with the exported LiteRT model
    yolo predict model=yolo26n.tflite source='https://ultralytics.com/images/bus.jpg'
    ```

!!! example "Validate"

=== "Python"

    ```python
    from ultralytics import YOLO

    # Load the exported LiteRT model
    model = YOLO("yolo26n.tflite")

    # Validate accuracy on the COCO8 dataset
    metrics = model.val(data="coco8.yaml")
    ```

=== "CLI"

    ```bash
    # Validate the exported LiteRT model
    yolo val model=yolo26n.tflite data=coco8.yaml
    ```

Export Arguments

ArgumentTypeDefaultDescription
formatstr'litert'Target format for the exported model, defining compatibility with various deployment environments.
imgszint or tuple640Desired image size for the model input. Can be an integer for square images or a tuple (height, width) for specific dimensions.
quantizeint or strNoneQuantization precision: 8 (static INT8, int8 weights + int8 activations; needs calibration data/fraction), 'w8a16' (static, int8 weights + int16 activations; needs calibration data/fraction), 'w8a32' (dynamic INT8, int8 weights + FP32 activations; no calibration needed), or 32/unset (FP32). FP16 is not exported separately (see note below). Replaces the deprecated half/int8 flags.
batchint1Specifies export model batch inference size or the max number of images the exported model will process concurrently in predict mode.
datastr'coco8.yaml'Dataset YAML used for INT8 calibration. If omitted with quantize=8, Ultralytics selects the default calibration dataset for the model task.
devicestrNoneSpecifies the device for exporting. LiteRT export runs on CPU (device=cpu).

!!! note "FP16 precision"

Unlike the legacy `tflite` export, LiteRT does not require a separate FP16 export. An FP32 `.tflite` model runs in **half precision at runtime** when using a GPU delegate (WebGPU, OpenCL, Metal) — this is the official LiteRT approach to FP16 inference.

For more details about the export process, visit the Ultralytics documentation page on exporting.

Deploying Exported YOLO LiteRT Models

After exporting your Ultralytics YOLO model to LiteRT, you can deploy it across platforms. The quickest way to verify it locally is the YOLO("yolo26n.tflite") method shown above. For deployment in other environments, see the following resources:

Mobile & Embedded

  • Android: A quick-start guide for integrating LiteRT into Android applications.
  • iOS: A guide for integrating and deploying LiteRT models in iOS applications.
  • Embedded Linux & Raspberry Pi: Run LiteRT models on single-board computers, optionally accelerated with a Coral Edge TPU.
  • Microcontrollers: Deploy on MCUs with only a few kilobytes of memory — the core runtime fits in roughly 16 KB on an Arm Cortex-M3.

Browser & Node.js (LiteRT.js)

  • LiteRT.js overview: Run the same .tflite model directly in the browser with WebGPU/WASM acceleration, eliminating server-side computation and keeping data on the user's device.
  • End-to-End Examples: Practical examples and tutorials for implementing LiteRT across mobile, edge, and web.

Summary

In this guide, we covered how to export Ultralytics YOLO models to the LiteRT format. By consolidating mobile/edge (formerly TFLite) and browser (formerly TF.js) deployment into a single .tflite model, LiteRT makes your YOLO models faster, smaller, and portable across virtually every on-device target.

For further details, visit the LiteRT official documentation.

Also, if you're curious about other Ultralytics YOLO integrations, check out our integration guide page for plenty of helpful resources.

FAQ

How do I export a YOLO model to LiteRT format?

Use the Ultralytics library to export a YOLO model to LiteRT (.tflite). First, install the package:

bash
pip install ultralytics

Then export your model:

python
from ultralytics import YOLO

# Load a YOLO26 model
model = YOLO("yolo26n.pt")

# Export the model to LiteRT format
model.export(format="litert")  # creates 'yolo26n.tflite'

For CLI users:

bash
yolo export model=yolo26n.pt format=litert # creates 'yolo26n.tflite'

For more details, visit the Ultralytics export guide.

What is the difference between LiteRT, TFLite, and TF.js?

LiteRT is the new name for TensorFlow Lite — same .tflite model format, same runtime lineage, rebranded by Google. In Ultralytics, the single litert export format now covers both use cases that previously required two separate formats:

  • The old tflite format → mobile, embedded, and edge deployment.
  • The old tfjs format → browser and Node.js deployment, now handled by LiteRT.js running the same .tflite file.

If you have an existing .tflite file, you can load it directly with YOLO("model.tflite") and it will run through the LiteRT backend.

Can I run YOLO LiteRT models on a Raspberry Pi?

Yes. Export your model to LiteRT format, then run it on a Raspberry Pi to improve inference speeds. For further optimization, consider a Coral Edge TPU. For detailed steps, refer to our Raspberry Pi deployment guide.

Can I run YOLO models in the browser with LiteRT?

Yes. LiteRT.js runs the same exported .tflite model directly in a web browser or Node.js application, with WebGPU/WASM acceleration. This replaces the previous TensorFlow.js workflow — there is no separate browser export, just deploy your LiteRT model with the LiteRT.js runtime.

Does LiteRT support FP16 (half-precision) inference?

Yes — at runtime. An FP32 LiteRT model automatically runs in FP16 when executed on a GPU delegate (WebGPU, OpenCL, or Metal), which is the official LiteRT approach. You therefore don't need a dedicated FP16 export; for further compression, use INT8 quantization with quantize=8.

How do I troubleshoot common issues during LiteRT export?

If you encounter errors while exporting YOLO models to LiteRT, common solutions include:

  • Check platform: LiteRT export is supported on Linux x86_64 and macOS. Verify your environment matches.
  • Check package compatibility: Ensure you're using a compatible version of Ultralytics. Refer to our installation guide.
  • Quantization issues: When using INT8 quantization, make sure your dataset path is correctly specified in the data parameter.

For additional troubleshooting tips, visit our Common Issues guide.