docs/en/guides/model-deployment-options.md
YOLO26 supports more than 20 deployment options, each tuned for a different runtime, hardware target, or platform — from PyTorch and ONNX to TensorRT, OpenVINO, CoreML, and dedicated edge-NPU formats. Picking the right one balances inference speed, hardware constraints, and ease of integration. This guide compares every option so you can choose the best fit for your application, then move on to model deployment best practices to deploy it reliably.
<p align="center"> <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/QkCsj2SvZc4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen> </iframe><strong>Watch:</strong> How to Choose the Best Ultralytics YOLO26 Deployment Format for Your Project | TensorRT | OpenVINO 🚀
</p>Deployment is the stage in the computer vision project workflow where a trained model starts doing real work, so the format you export to has a direct impact on speed, cost, and portability.
When it's time to deploy your YOLO26 model, selecting a suitable export format is very important. As outlined in the Ultralytics YOLO26 export documentation, the model.export() function converts your trained model into a variety of formats tailored to diverse environments and performance requirements.
The ideal format depends on your model's intended operational context and hardware.
!!! tip "Skip the manual export"
For managed deployment without manual export, [Ultralytics Platform](https://platform.ultralytics.com) provides ready-to-use [inference endpoints](../platform/deploy/endpoints.md) with auto-scaling across 43 global regions.
Here is a short description of each format and when to reach for it. For the full export walkthrough, see the export documentation; for the side-by-side criteria, jump to the comparison table.
.pt): The native training and inference format, offering maximum flexibility and CUDA GPU acceleration — ideal for research and prototyping with no export step required.torchscript): Serializes the model for a Python-free C++ runtime, suited to production systems where Python is unavailable.onnx): A framework-agnostic interchange format with broad cross-platform and hardware support through ONNX Runtime.openvino): Intel's toolkit for optimized inference on Intel CPUs, integrated GPUs, and NPUs, common in IoT and edge computing.engine): NVIDIA's high-performance runtime delivering top-tier GPU inference with FP16 and INT8 optimization.coreml): Apple's on-device format for iOS, macOS, watchOS, and tvOS, using the Apple Neural Engine.saved_model): TensorFlow's standard format for scalable server-side serving with TensorFlow Serving.pb): A frozen static-graph TensorFlow format for environments that need a fixed computation graph.tflite): A lightweight TensorFlow runtime for on-device inference on mobile and embedded hardware.edgetpu): Compiles TF Lite models for Google Coral Edge TPU accelerators.tfjs): Runs models directly in the browser with no backend, accelerated through WebGL.paddle): Baidu's deep learning framework, popular in China, with broad hardware support.mnn): A lightweight, high-performance inference engine optimized for mobile and embedded ARM and x86-64 systems.ncnn): A high-performance, lightweight inference framework tuned for mobile ARM devices.imx): Exports for Sony's IMX500 intelligent vision sensor with on-chip processing, such as the Raspberry Pi AI Camera.rknn): Targets Rockchip NPUs on embedded boards with FP16 and INT8 quantization.executorch): PyTorch's native on-device runtime for mobile (iOS and Android) and embedded systems via XNNPACK.axelera): Compiles for Axelera's Metis AIPU (up to 856 TOPS) over PCIe or M.2 for high-throughput edge inference.deepx): Targets DEEPX NPU hardware with INT8 quantization for embedded edge inference.qnn): On-device inference on Snapdragon Hexagon NPU, Adreno GPU, and CPU through the Qualcomm AI stack.For an additional edge target, the Hailo integration compiles YOLO detection models to Hailo HEF. It is not a direct model.export() target: detection models are exported to ONNX first, then compiled to HEF with the external Hailo Dataflow Compiler for Hailo-8, Hailo-8L, and Hailo-15 accelerators.
The following table summarizes the deployment options for YOLO26 models across the criteria that usually drive the choice. For an in-depth look at each format, see the export formats documentation.
| Deployment Option | Performance Benchmarks | Compatibility and Integration | Community Support and Ecosystem | Case Studies | Maintenance and Updates | Security Considerations | Hardware Acceleration |
|---|---|---|---|---|---|---|---|
| PyTorch | Good flexibility; may trade off raw performance | Excellent with Python libraries | Extensive resources and community | Research and prototypes | Regular, active development | Dependent on deployment environment | CUDA support for GPU acceleration |
| TorchScript | Better for production than PyTorch | Smooth transition from PyTorch to C++ | Specialized but narrower than PyTorch | Industry where Python is a bottleneck | Consistent updates with PyTorch | Improved security without full Python | Inherits CUDA support from PyTorch |
| ONNX | Variable depending on runtime | High across different frameworks | Broad ecosystem, supported by many orgs | Flexibility across ML frameworks | Regular updates for new operations | Ensure secure conversion and deployment practices | Various hardware optimizations |
| OpenVINO | Optimized for Intel hardware | Best within Intel ecosystem | Solid in computer vision domain | IoT and edge with Intel hardware | Regular updates for Intel hardware | Robust features for sensitive applications | Tailored for Intel hardware |
| TensorRT | Top-tier on NVIDIA GPUs | Best for NVIDIA hardware | Strong network through NVIDIA | Real-time video and image inference | Frequent updates for new GPUs | Emphasis on security | Designed for NVIDIA GPUs |
| CoreML | Optimized for on-device Apple hardware | Exclusive to Apple ecosystem | Strong Apple and developer support | On-device ML on Apple products | Regular Apple updates | Focus on privacy and security | Apple neural engine and GPU |
| TF SavedModel | Scalable in server environments | Wide compatibility in TensorFlow ecosystem | Large support due to TensorFlow popularity | Serving models at scale | Regular updates by Google and community | Robust features for enterprise | Various hardware accelerations |
| TF GraphDef | Stable for static computation graphs | Integrates well with TensorFlow infrastructure | Resources for optimizing static graphs | Scenarios requiring static graphs | Updates alongside TensorFlow core | Established TensorFlow security practices | TensorFlow acceleration options |
| TF Lite | Speed and efficiency on mobile/embedded | Wide range of device support | Robust community, Google backed | Mobile applications with minimal footprint | Latest features for mobile | Secure environment on end-user devices | GPU and DSP among others |
| TF Edge TPU | Optimized for Google's Edge TPU hardware | Exclusive to Edge TPU devices | Growing with Google and third-party resources | IoT devices requiring real-time processing | Improvements for new Edge TPU hardware | Google's robust IoT security | Custom-designed for Google Coral |
| TF.js | Reasonable in-browser performance | High with web technologies | Web and Node.js developers support | Interactive web applications | TensorFlow team and community contributions | Web platform security model | Enhanced with WebGL and other APIs |
| PaddlePaddle | Competitive, easy to use and scalable | Baidu ecosystem, wide application support | Rapidly growing, especially in China | Chinese market and language processing | Focus on Chinese AI applications | Emphasizes data privacy and security | Including Baidu's Kunlun chips |
| MNN | High-performance for mobile devices | Mobile and embedded ARM systems and X86-64 CPU | Mobile/embedded ML community | Mobile systems efficiency | High performance maintenance on mobile devices | On-device security advantages | ARM CPUs and GPUs optimizations |
| NCNN | Optimized for mobile ARM-based devices | Mobile and embedded ARM systems | Niche but active mobile/embedded ML community | Android and ARM systems efficiency | High performance maintenance on ARM | On-device security advantages | ARM CPUs and GPUs optimizations |
| Sony IMX500 | On-sensor inference at very low power | Sony IMX500 sensor, Raspberry Pi AI Camera | Sony AITRIOS ecosystem | On-camera edge AI | Sony SDK and MCT toolchain updates | Data stays on the sensor | Sony IMX500 on-chip accelerator |
| Rockchip RKNN | Optimized for Rockchip NPUs | Rockchip SoC boards (e.g. RK3588) | Rockchip developer community | Embedded SBC and edge devices | Rockchip RKNN-Toolkit updates | On-device local inference | Rockchip NPU |
| ExecuTorch | Efficient on-device PyTorch runtime | iOS, Android, embedded via XNNPACK | Backed by the PyTorch project | Mobile and embedded apps | Maintained alongside PyTorch | On-device inference keeps data local | XNNPACK and mobile CPU/GPU backends |
| Axelera AI | Very high throughput (up to 856 TOPS) | Metis AIPU over PCIe or M.2 | Axelera Voyager SDK | High-throughput edge inference | Axelera SDK updates | On-premises edge inference | Axelera Metis AIPU |
| DEEPX | INT8-optimized NPU inference | DEEPX NPU hardware | DEEPX developer tools (dx_com, dx_engine) | Embedded edge inference | DEEPX SDK and runtime updates | On-device local inference | DEEPX NPU |
| Qualcomm QNN | Fast on-device Snapdragon inference | Snapdragon Hexagon NPU, Adreno GPU, CPU | Qualcomm AI Hub ecosystem | Mobile and edge Snapdragon devices | Qualcomm AI stack (QAIRT) updates | On-device inference keeps data local | Snapdragon Hexagon NPU |
This comparison gives you a high-level overview. For deployment, weigh the specific requirements and constraints of your project against each option, and consult the linked integration guide for the format you choose.
YOLO26's wide range of export formats lets you tailor a model to almost any environment, from a cloud GPU server to an on-sensor edge camera. Once you have picked a format, follow model deployment best practices for optimization, troubleshooting, and security, and lean on the Ultralytics community when you hit a snag.
Ultralytics YOLO26 supports various deployment formats, each designed for specific environments and hardware platforms. Key formats include:
Each format has unique advantages. For a detailed walkthrough, see our export process documentation.
To enhance inference speed on Intel CPUs, you can deploy your YOLO26 model using Intel's OpenVINO toolkit. OpenVINO offers significant performance boosts by optimizing models to leverage Intel hardware efficiently.
model.export() function.For more insights, check out our blog post.
Yes, YOLO26 models can be deployed on mobile devices using TensorFlow Lite (TF Lite) for both Android and iOS platforms. TF Lite is designed for mobile and embedded devices, providing efficient on-device inference.
!!! example
=== "Python"
```python
# Export command for TFLite format
model.export(format="tflite")
```
=== "CLI"
```bash
# CLI command for TFLite export
yolo export model=yolo26n.pt format=tflite
```
For more details on deploying models to mobile, refer to our TF Lite integration guide.
When choosing a deployment format for YOLO26, consider the following factors:
For a comparative analysis, refer to our export formats documentation.
To deploy YOLO26 models in a web application, you can use TensorFlow.js (TF.js), which allows for running machine learning models directly in the browser. This approach eliminates the need for backend infrastructure and provides real-time performance.
For step-by-step instructions, refer to our guide on TensorFlow.js integration.