examples/cpp/Triton/README.md
A C++ gRPC client that runs every Ultralytics YOLO task and model generation against a model served by the NVIDIA Triton Inference Server. The client reads the input/output layout from the model metadata, infers the task from the output shapes, and shares its post-processing with the other C++ examples — so the same binary handles detection, segmentation, pose, OBB, classification, and YOLO26 semantic segmentation.
Ensure you have the following dependencies installed before proceeding:
| Dependency | Version | Description |
|---|---|---|
| Triton Inference Server | 22.06+ | Running with a deployed YOLO model |
| Triton Client libraries | 2.23+ | Required for communication with Triton Server |
| C++ compiler | C++ 17+ | For compiling the C++ client application |
| OpenCV library | >=3.4 | For image processing and visualization |
| CMake | 3.5+ | For building the project |
For more information on Triton, see the NVIDIA Triton Inference Server documentation and explore model deployment options with Ultralytics.
Export any model and task, then add it to a Triton model repository. ONNX is a convenient serving format and keeps the output0 (and output1 for segmentation) tensor names this client expects.
yolo export model=yolo26n.pt format=onnx opset=12 # detect (end2end)
yolo export model=yolo26n-seg.pt format=onnx opset=12 # segment
yolo export model=yolo26n-pose.pt format=onnx opset=12 # pose
yolo export model=yolo26n-obb.pt format=onnx opset=12 # obb
yolo export model=yolo26n-cls.pt format=onnx opset=12 # classify
yolo export model=yolo26n-sem.pt format=onnx opset=12 # semantic
yolo export model=yolo11n.pt format=onnx opset=12 dynamic=True # YOLOv8/YOLO11 (grid) work too
Add half=True device=0 to export an FP16 model on a GPU; the client reads the input/output datatype from the metadata and handles FP16 or FP32 automatically.
Place the exported model under <repository>/<model_name>/1/model.onnx. Triton's ONNX backend auto-completes the configuration, so a config.pbtxt is optional. A minimal repository looks like:
models/
└── yolo26n/
└── 1/
└── model.onnx
Then start Triton pointing at the repository:
tritonserver --model-repository=/models
See the Ultralytics Triton guide for a full walkthrough.
Install the Triton Client libraries:
wget https://github.com/triton-inference-server/server/releases/download/v2.23.0/v2.23.0_ubuntu2004.clients.tar.gz
mkdir tritonclient
tar -xvf v2.23.0_ubuntu2004.clients.tar.gz -C tritonclient
rm -f v2.23.0_ubuntu2004.clients.tar.gz
Clone the Ultralytics repository:
git clone https://github.com/ultralytics/ultralytics.git
cd ultralytics/examples/cpp/Triton
Configure and build the project using CMake:
mkdir build
cd build
cmake .. -DTRITON_CLIENT_DIR=/path/to/tritonclient
make
The shared helpers in ../common are header-only and added to the include path automatically.
Start your Triton server with a deployed YOLO model, then run the client. Use the model name as deployed in the repository as --model.
# Defaults: --url localhost:8001 --model yolo26n --source bus.jpg --conf 0.25 --iou 0.45 --out result.jpg
./yolo_triton --model yolo26n --source bus.jpg # detect (auto)
./yolo_triton --model yolo26n-seg --source bus.jpg --out seg.jpg # segment (auto)
./yolo_triton --model yolo26n-pose --source bus.jpg --out pose.jpg # pose (auto, end2end)
./yolo_triton --model yolo26n-obb --source boats.jpg --out obb.jpg # obb (auto, end2end)
./yolo_triton --model yolo26n-cls --source bus.jpg --out cls.jpg # classify (auto)
./yolo_triton --model yolo26n-sem --source bus.jpg --out sem.jpg # semantic (auto)
./yolo_triton --model yolo11n-pose --source bus.jpg --task pose # legacy grid pose: needs --task
./yolo_triton --url 192.168.1.10:8001 --model yolo26n --source street.jpg --show
[!NOTE] Triton exposes no task or class-name metadata, so the task is inferred from the output shapes. With YOLO26 (end-to-end) models every task — including pose and OBB is detected automatically. Only the legacy grid YOLOv8/11 pose
[1, 56, 8400]and obb[1, 20, 8400]outputs are ambiguous with detection (they differ only by the class count, which Triton does not expose), so for those pass--task poseor--task obb. Class names fall back to COCO, so a non-COCO model (1000-class classify, DOTA obb) prints class indices rather than names.
| Argument | Default | Description |
|---|---|---|
--url | localhost:8001 | Triton server gRPC endpoint. |
--model | yolo26n | Model name as deployed in the Triton repository. |
--version | latest | Model version (empty selects the latest). |
--source | bus.jpg | Input image. |
--conf | 0.25 | Confidence threshold. |
--iou | 0.45 | NMS IoU threshold (grid models only; end2end models skip NMS). |
--imgsz | 640 | Input size used when the model shape is dynamic. |
--task | auto | Override the inferred task (detect, segment, pose, ...). |
--out | result.jpg | Output image path. |
--show | off | Also open a display window. |
The annotated result is always written to --out and the detections are printed to the console.
Triton exposes no class names, so the example falls back to the 80 COCO names from ../common/coco_names.hpp. The task is inferred from the output shapes; pass --task to override it (required for grid pose/obb, as noted above).
Contributions are welcome! If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on the main Ultralytics repository.
This example was originally contributed by: