Back to Ultralytics

Ultralytics YOLO ONNX Runtime Inference in C++

examples/cpp/ONNXRuntime/README.md

8.4.717.2 KB
Original Source

Ultralytics YOLO ONNX Runtime Inference in C++

A single C++ application that runs every Ultralytics YOLO task and model generation with ONNX Runtime and OpenCV. Point it at any exported .onnx model — the program reads the task, class names, and input size from the model metadata and picks the right post-processing automatically.

✨ Features

  • All tasks: detect, segment, pose, OBB, classify, and YOLO26 semantic segmentation.
  • All generations: YOLOv8, YOLO11, and YOLO26. The grid output of YOLOv8/11 and the end-to-end (NMS-free) output of YOLO26 are detected automatically from the tensor shape — no flags needed.
  • Zero configuration: task, class names, and imgsz come from the model metadata that Ultralytics bakes into every export. No coco.yaml or hard-coded class lists.
  • FP32 and FP16: half-precision (FP16) models are detected automatically from the input type and run with no extra flags.
  • Simple CLI: choose the model, source image, and thresholds at runtime — no recompiling.

📋 Dependencies

DependencyVersionNotes
ONNX Runtime>=1.14Download the pre-built binaries (CPU or GPU).
OpenCV>=4.0Image I/O, drawing, and NMS.
C++ CompilerC++17For <filesystem>.
CMake>=3.5Build system.
CUDAoptionalOnly for the ONNX Runtime CUDA execution provider (--cuda).

📦 Exporting a Model

Export any model and task to ONNX with the Ultralytics export mode. opset=12 is recommended for broad compatibility.

bash
yolo export model=yolo26n.pt format=onnx opset=12      # detect   (end2end)
yolo export model=yolo26n-seg.pt format=onnx opset=12  # segment
yolo export model=yolo26n-pose.pt format=onnx opset=12 # pose
yolo export model=yolo26n-obb.pt format=onnx opset=12  # obb
yolo export model=yolo26n-cls.pt format=onnx opset=12  # classify
yolo export model=yolo26n-sem.pt format=onnx opset=12  # semantic

YOLOv8 and YOLO11 grid models work too — the output layout is detected automatically.

See the Export documentation for more options.

To run a half-precision model, export with half=True on a GPU (on CPU, half=True is ignored and the export stays FP32). The example detects the FP16 input type and runs it automatically:

bash
yolo export model=yolo26n.pt format=onnx half=True device=0

If you only have a CPU, convert an exported FP32 ONNX to FP16 with ONNX Runtime's converter (it handles the Resize op correctly):

python
import onnx
from onnxruntime.transformers.float16 import convert_float_to_float16

model = onnx.load("yolo26n.onnx")
onnx.save(convert_float_to_float16(model, keep_io_types=False), "yolo26n_fp16.onnx")

🛠️ Build

bash
git clone https://github.com/ultralytics/ultralytics.git
cd ultralytics/examples/cpp/ONNXRuntime
mkdir build && cd build

# Point ONNXRUNTIME_ROOT at the extracted ONNX Runtime. Use -DUSE_CUDA=OFF for a CPU-only build.
cmake .. -DONNXRUNTIME_ROOT=/path/to/onnxruntime -DUSE_CUDA=OFF
cmake --build . --config Release

The shared helpers in ../common are header-only and added to the include path automatically.

🚀 Usage

bash
# If the ONNX Runtime libraries are not installed system-wide, add them to the loader path:
export LD_LIBRARY_PATH=/path/to/onnxruntime/lib:$LD_LIBRARY_PATH

# Defaults: --model yolo26n.onnx --source bus.jpg --conf 0.25 --iou 0.45 --out result.jpg
./yolo_onnxruntime --model yolo26n.onnx --source bus.jpg
./yolo_onnxruntime --model yolo26n-seg.onnx --source bus.jpg --out seg.jpg
./yolo_onnxruntime --model yolo26n-pose.onnx --source bus.jpg --show
./yolo_onnxruntime --model yolo26n-sem.onnx --source street.jpg
ArgumentDefaultDescription
--modelyolo26n.onnxPath to the exported .onnx model (any task/generation).
--sourcebus.jpgInput image.
--conf0.25Confidence threshold.
--iou0.45NMS IoU threshold (grid models only; end2end models skip NMS).
--outresult.jpgOutput image path.
--cudaoffUse the CUDA execution provider (requires a GPU ONNX Runtime build compiled with -DUSE_CUDA=ON).
--showoffAlso open a display window.

The annotated result is always written to --out and the detections are printed to the console. The task is shown at startup, e.g. Model: yolo26n.onnx | task: detect | classes: 80.

🏷️ Class Names & Task

The task type, class names, and input imgsz are read directly from the model metadata that Ultralytics bakes into every export — so the same binary handles a COCO detector, a 1000-class ImageNet classifier, and a 19-class Cityscapes semantic model with no changes. If a model somehow lacks names, the example falls back to the 80 COCO names from ../common/coco_names.hpp.

🤝 Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on the main Ultralytics repository.