Back to Ultralytics

Ultralytics YOLO Triton Inference in C++

examples/cpp/Triton/README.md

8.4.719.0 KB
Original Source

Ultralytics YOLO Triton Inference in C++

A C++ gRPC client that runs every Ultralytics YOLO task and model generation against a model served by the NVIDIA Triton Inference Server. The client reads the input/output layout from the model metadata, infers the task from the output shapes, and shares its post-processing with the other C++ examples — so the same binary handles detection, segmentation, pose, OBB, classification, and YOLO26 semantic segmentation.

✨ Features

  • All tasks: detect, segment, pose, OBB, classify, and YOLO26 semantic segmentation.
  • All generations: YOLOv8, YOLO11, and YOLO26. The grid output of YOLOv8/11 and the end-to-end (NMS-free) output of YOLO26 are detected automatically from the tensor shape.
  • FP16 and FP32: the input and output datatypes are read from the model metadata, so half-precision (FP16) and full-precision models both work with no flags.
  • Seamless Triton integration: communicates with the server over gRPC for efficient, scalable model serving.
  • Simple CLI: choose the server URL, model name, source image, and thresholds at runtime — no recompiling.

📋 Dependencies

Ensure you have the following dependencies installed before proceeding:

DependencyVersionDescription
Triton Inference Server22.06+Running with a deployed YOLO model
Triton Client libraries2.23+Required for communication with Triton Server
C++ compilerC++ 17+For compiling the C++ client application
OpenCV library>=3.4For image processing and visualization
CMake3.5+For building the project

For more information on Triton, see the NVIDIA Triton Inference Server documentation and explore model deployment options with Ultralytics.

📦 Deploying a Model

Export any model and task, then add it to a Triton model repository. ONNX is a convenient serving format and keeps the output0 (and output1 for segmentation) tensor names this client expects.

bash
yolo export model=yolo26n.pt format=onnx opset=12              # detect   (end2end)
yolo export model=yolo26n-seg.pt format=onnx opset=12          # segment
yolo export model=yolo26n-pose.pt format=onnx opset=12         # pose
yolo export model=yolo26n-obb.pt format=onnx opset=12          # obb
yolo export model=yolo26n-cls.pt format=onnx opset=12          # classify
yolo export model=yolo26n-sem.pt format=onnx opset=12          # semantic
yolo export model=yolo11n.pt format=onnx opset=12 dynamic=True # YOLOv8/YOLO11 (grid) work too

Add half=True device=0 to export an FP16 model on a GPU; the client reads the input/output datatype from the metadata and handles FP16 or FP32 automatically.

Place the exported model under <repository>/<model_name>/1/model.onnx. Triton's ONNX backend auto-completes the configuration, so a config.pbtxt is optional. A minimal repository looks like:

text
models/
└── yolo26n/
    └── 1/
        └── model.onnx

Then start Triton pointing at the repository:

bash
tritonserver --model-repository=/models

See the Ultralytics Triton guide for a full walkthrough.

🛠️ Building the Project

  1. Install the Triton Client libraries:

    bash
    wget https://github.com/triton-inference-server/server/releases/download/v2.23.0/v2.23.0_ubuntu2004.clients.tar.gz
    mkdir tritonclient
    tar -xvf v2.23.0_ubuntu2004.clients.tar.gz -C tritonclient
    rm -f v2.23.0_ubuntu2004.clients.tar.gz
    
  2. Clone the Ultralytics repository:

    bash
    git clone https://github.com/ultralytics/ultralytics.git
    cd ultralytics/examples/cpp/Triton
    
  3. Configure and build the project using CMake:

    bash
    mkdir build
    cd build
    cmake .. -DTRITON_CLIENT_DIR=/path/to/tritonclient
    make
    

The shared helpers in ../common are header-only and added to the include path automatically.

🚀 Usage

Start your Triton server with a deployed YOLO model, then run the client. Use the model name as deployed in the repository as --model.

bash
# Defaults: --url localhost:8001 --model yolo26n --source bus.jpg --conf 0.25 --iou 0.45 --out result.jpg
./yolo_triton --model yolo26n --source bus.jpg                     # detect   (auto)
./yolo_triton --model yolo26n-seg --source bus.jpg --out seg.jpg   # segment  (auto)
./yolo_triton --model yolo26n-pose --source bus.jpg --out pose.jpg # pose     (auto, end2end)
./yolo_triton --model yolo26n-obb --source boats.jpg --out obb.jpg # obb      (auto, end2end)
./yolo_triton --model yolo26n-cls --source bus.jpg --out cls.jpg   # classify (auto)
./yolo_triton --model yolo26n-sem --source bus.jpg --out sem.jpg   # semantic (auto)
./yolo_triton --model yolo11n-pose --source bus.jpg --task pose    # legacy grid pose: needs --task
./yolo_triton --url 192.168.1.10:8001 --model yolo26n --source street.jpg --show

[!NOTE] Triton exposes no task or class-name metadata, so the task is inferred from the output shapes. With YOLO26 (end-to-end) models every task — including pose and OBB is detected automatically. Only the legacy grid YOLOv8/11 pose [1, 56, 8400] and obb [1, 20, 8400] outputs are ambiguous with detection (they differ only by the class count, which Triton does not expose), so for those pass --task pose or --task obb. Class names fall back to COCO, so a non-COCO model (1000-class classify, DOTA obb) prints class indices rather than names.

ArgumentDefaultDescription
--urllocalhost:8001Triton server gRPC endpoint.
--modelyolo26nModel name as deployed in the Triton repository.
--versionlatestModel version (empty selects the latest).
--sourcebus.jpgInput image.
--conf0.25Confidence threshold.
--iou0.45NMS IoU threshold (grid models only; end2end models skip NMS).
--imgsz640Input size used when the model shape is dynamic.
--taskautoOverride the inferred task (detect, segment, pose, ...).
--outresult.jpgOutput image path.
--showoffAlso open a display window.

The annotated result is always written to --out and the detections are printed to the console.

🏷️ Class Names & Task

Triton exposes no class names, so the example falls back to the 80 COCO names from ../common/coco_names.hpp. The task is inferred from the output shapes; pass --task to override it (required for grid pose/obb, as noted above).

🤝 Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on the main Ultralytics repository.

This example was originally contributed by: