docs/en/guides/deepstream-nvidia-jetson.md
<strong>Watch:</strong> How to use Ultralytics YOLO26 models with NVIDIA Deepstream on Jetson Orin NX 🚀
</p>This comprehensive guide provides a detailed walkthrough for deploying Ultralytics YOLO26 on NVIDIA Jetson devices using DeepStream SDK and TensorRT. Here we use TensorRT to maximize the inference performance on the Jetson platform.
!!! note
This guide has been tested with [NVIDIA Jetson Orin Nano Super Developer Kit](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/nano-super-developer-kit) running the latest stable JetPack release of [JP6.1](https://developer.nvidia.com/embedded/jetpack-sdk-61),
[Seeed Studio reComputer J4012](https://www.seeedstudio.com/reComputer-J4012-p-5586.html) which is based on NVIDIA Jetson Orin NX 16GB running JetPack release of [JP5.1.3](https://developer.nvidia.com/embedded/jetpack-sdk-513) and [Seeed Studio reComputer J1020 v2](https://www.seeedstudio.com/reComputer-J1020-v2-p-5498.html) which is based on NVIDIA Jetson Nano 4GB running JetPack release of [JP4.6.4](https://developer.nvidia.com/jetpack-sdk-464). It is expected to work across all the NVIDIA Jetson hardware lineup including latest and legacy.
NVIDIA's DeepStream SDK is a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio, and image understanding. It's ideal for vision AI developers, software partners, startups, and OEMs building IVA (Intelligent Video Analytics) apps and services. You can now create stream-processing pipelines that incorporate neural networks and other complex processing tasks like tracking, video encoding/decoding, and video rendering. These pipelines enable real-time analytics on video, image, and sensor data. DeepStream's multi-platform support gives you a faster, easier way to develop vision AI applications and services on-premise, at the edge, and in the cloud.
Before you start to follow this guide:
!!! tip
In this guide we have used the Debian package method of installing DeepStream SDK to the Jetson device. You can also visit the [DeepStream SDK on Jetson (Archived)](https://developer.nvidia.com/embedded/deepstream-on-jetson-downloads-archived) to access legacy versions of DeepStream.
Here we are using marcoslucianops/DeepStream-Yolo GitHub repository which includes NVIDIA DeepStream SDK support for YOLO models. We appreciate the efforts of marcoslucianops for his contributions!
Install Ultralytics with necessary dependencies
cd ~
pip install -U pip
git clone https://github.com/ultralytics/ultralytics
cd ultralytics
pip install -e ".[export]" onnxslim
Clone the DeepStream-Yolo repository
cd ~
git clone https://github.com/marcoslucianops/DeepStream-Yolo
Copy the export_yolo26.py file from DeepStream-Yolo/utils directory to the ultralytics folder
cp ~/DeepStream-Yolo/utils/export_yolo26.py ~/ultralytics
cd ultralytics
Download Ultralytics YOLO26 detection model (.pt) of your choice from YOLO26 releases. Here we use yolo26s.pt.
wget https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26s.pt
!!! note
You can also use a [custom-trained YOLO26 model](https://docs.ultralytics.com/modes/train/).
Convert model to ONNX
python3 export_yolo26.py -w yolo26s.pt
!!! note "Pass the below arguments to the above command"
For DeepStream 5.1, remove the `--dynamic` arg and use `opset` 12 or lower. The default `opset` is 17.
```bash
--opset 12
```
To change the inference size (default: 640)
```bash
-s SIZE
--size SIZE
-s HEIGHT WIDTH
--size HEIGHT WIDTH
```
Example for 1280:
```bash
-s 1280
or
-s 1280 1280
```
To simplify the ONNX model (DeepStream >= 6.0)
```bash
--simplify
```
To use dynamic batch-size (DeepStream >= 6.1)
```bash
--dynamic
```
To use static batch-size (example for batch-size = 4)
```bash
--batch 4
```
Copy the generated .onnx model file and labels.txt file to the DeepStream-Yolo folder
cp yolo26s.pt.onnx labels.txt ~/DeepStream-Yolo
cd ~/DeepStream-Yolo
Set the CUDA version according to the JetPack version installed
For JetPack 4.6.4:
export CUDA_VER=10.2
For JetPack 5.1.3:
export CUDA_VER=11.4
For JetPack 6.1:
export CUDA_VER=12.6
Compile the library
make -C nvdsinfer_custom_impl_Yolo clean && make -C nvdsinfer_custom_impl_Yolo
Edit the config_infer_primary_yolo26.txt file according to your model (for YOLO26s with 80 classes)
[property]
...
onnx-file=yolo26s.pt.onnx
...
num-detected-classes=80
...
Edit the deepstream_app_config file
...
[primary-gie]
...
config-file=config_infer_primary_yolo26.txt
You can also change the video source in deepstream_app_config file. Here, a default video file is loaded
...
[source0]
...
uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
deepstream-app -c deepstream_app_config.txt
!!! note
It will take a long time to generate the TensorRT engine file before starting the inference. So please be patient.
!!! tip
If you want to convert the model to FP16 precision, simply set `model-engine-file=model_b1_gpu0_fp16.engine` and `network-mode=2` inside `config_infer_primary_yolo26.txt`
If you want to use INT8 precision for inference, you need to follow the steps below:
!!! note
Currently INT8 does not work with TensorRT 10.x. This section of the guide has been tested with TensorRT 8.x which is expected to work.
Set OPENCV environment variable
export OPENCV=1
Compile the library
make -C nvdsinfer_custom_impl_Yolo clean && make -C nvdsinfer_custom_impl_Yolo
For COCO dataset, download the val2017, extract, and move to DeepStream-Yolo folder
Make a new directory for calibration images
mkdir calibration
Run the following to select 1000 random images from COCO dataset to run calibration
for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do
cp ${jpg} calibration/
done
!!! note
NVIDIA recommends at least 500 images to get a good [accuracy](https://www.ultralytics.com/glossary/accuracy). On this example, 1000 images are chosen to get better accuracy (more images = more accuracy). You can set it from **head -1000**. For example, for 2000 images, **head -2000**. This process can take a long time.
Create the calibration.txt file with all selected images
realpath calibration/*jpg > calibration.txt
Set environment variables
export INT8_CALIB_IMG_PATH=calibration.txt
export INT8_CALIB_BATCH_SIZE=1
!!! note
Higher INT8_CALIB_BATCH_SIZE values will result in more accuracy and faster calibration speed. Set it according to your GPU memory.
Update the config_infer_primary_yolo26.txt file
From
...
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
...
network-mode=0
...
To
...
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
...
network-mode=1
...
deepstream-app -c deepstream_app_config.txt
<strong>Watch:</strong> How to Run Multi-Stream Inference with Ultralytics YOLO26 using NVIDIA DeepStream on Jetson Orin 🚀
</p>To set up multiple streams under a single DeepStream application, make the following changes to the deepstream_app_config.txt file:
Change the rows and columns to build a grid display according to the number of streams you want to have. For example, for 4 streams, we can add 2 rows and 2 columns.
[tiled-display]
rows=2
columns=2
Set num-sources=4 and add the uri entries for all four streams.
[source0]
enable=1
type=3
uri=path/to/video1.jpg
uri=path/to/video2.jpg
uri=path/to/video3.jpg
uri=path/to/video4.jpg
num-sources=4
deepstream-app -c deepstream_app_config.txt
The following benchmarks summarizes how YOLO26 models perform at different TensorRT precision levels with an input size of 640x640 on NVIDIA Jetson Orin NX 16GB.
!!! tip "Performance"
=== "YOLO11n"
| Format | Status | Inference time (ms/im) |
|-----------------|--------|------------------------|
| TensorRT (FP32) | ✅ | 8.64 |
| TensorRT (FP16) | ✅ | 5.27 |
| TensorRT (INT8) | ✅ | 4.54 |
=== "YOLO11s"
| Format | Status | Inference time (ms/im) |
|-----------------|--------|------------------------|
| TensorRT (FP32) | ✅ | 14.53 |
| TensorRT (FP16) | ✅ | 7.91 |
| TensorRT (INT8) | ✅ | 6.05 |
=== "YOLO11m"
| Format | Status | Inference time (ms/im) |
|-----------------|--------|------------------------|
| TensorRT (FP32) | ✅ | 32.05 |
| TensorRT (FP16) | ✅ | 15.55 |
| TensorRT (INT8) | ✅ | 10.43 |
=== "YOLO11l"
| Format | Status | Inference time (ms/im) |
|-----------------|--------|------------------------|
| TensorRT (FP32) | ✅ | 39.68 |
| TensorRT (FP16) | ✅ | 19.88 |
| TensorRT (INT8) | ✅ | 13.64 |
=== "YOLO11x"
| Format | Status | Inference time (ms/im) |
|-----------------|--------|------------------------|
| TensorRT (FP32) | ✅ | 80.65 |
| TensorRT (FP16) | ✅ | 39.06 |
| TensorRT (INT8) | ✅ | 22.83 |
This guide was initially created by our friends at Seeed Studio, Lakshantha and Elaine.
To set up Ultralytics YOLO26 on an NVIDIA Jetson device, you first need to install the DeepStream SDK compatible with your JetPack version. Follow the step-by-step guide in our Quick Start Guide to configure your NVIDIA Jetson for YOLO26 deployment.
Using TensorRT with YOLO26 optimizes the model for inference, significantly reducing latency and improving throughput on NVIDIA Jetson devices. TensorRT provides high-performance, low-latency deep learning inference through layer fusion, precision calibration, and kernel auto-tuning. This leads to faster and more efficient execution, particularly useful for real-time applications like video analytics and autonomous machines.
Yes, the guide for deploying Ultralytics YOLO26 with the DeepStream SDK and TensorRT is compatible across the entire NVIDIA Jetson lineup. This includes devices like the Jetson Orin NX 16GB with JetPack 5.1.3 and the Jetson Nano 4GB with JetPack 4.6.4. Refer to the section DeepStream Configuration for YOLO26 for detailed steps.
To convert a YOLO26 model to ONNX format for deployment with DeepStream, use the utils/export_yolo26.py script from the DeepStream-Yolo repository.
Here's an example command:
python3 utils/export_yolo26.py -w yolo26s.pt --opset 12 --simplify
For more details on model conversion, check out our model export section.
The performance of YOLO26 models on NVIDIA Jetson Orin NX 16GB varies based on TensorRT precision levels. For example, YOLO26s models achieve:
These benchmarks underscore the efficiency and capability of using TensorRT-optimized YOLO26 models on NVIDIA Jetson hardware. For further details, see our Benchmark Results section.