Back to Ultralytics

Predict Args

docs/macros/predict-args.md

8.4.4611.8 KB
Original Source
ArgumentTypeDefaultDescription
sourcestr or int or NoneNoneSpecifies the data source for inference. Can be an image path, video file, directory, URL, or device ID for live feeds. If omitted, a warning is logged and the model falls back to the built-in demo assets (ultralytics/assets, or a demo URL for OBB). Supports a wide range of formats and sources, enabling flexible application across different types of input.
conffloat0.25Sets the minimum confidence threshold for detections. Objects detected with confidence below this threshold will be disregarded. Adjusting this value can help reduce false positives.
ioufloat0.7Intersection Over Union (IoU) threshold for Non-Maximum Suppression (NMS). Lower values result in fewer detections by eliminating overlapping boxes, useful for reducing duplicates.
imgszint or tuple640Defines the image size for inference. Can be a single integer 640 for square resizing or a (height, width) tuple. Proper sizing can improve detection accuracy and processing speed.
rectboolTrueIf enabled, minimally pads the shorter side of the image until it's divisible by stride to improve inference speed. If disabled, pads the image to a square during inference.
halfboolFalseEnables half-precision (FP16) inference, which can speed up model inference on supported GPUs with minimal impact on accuracy.
devicestrNoneSpecifies the device for inference (e.g., cpu, cuda:0, 0, npu or npu:0). Allows users to select between CPU, a specific GPU, Huawei Ascend NPU, or other compute devices for model execution.
batchint1Specifies the batch size for inference (only works when the source is a directory, video file, or .txt file). A larger batch size can provide higher throughput, shortening the total amount of time required for inference.
max_detint300Maximum number of detections allowed per image. Limits the total number of objects the model can detect in a single inference, preventing excessive outputs in dense scenes.
vid_strideint1Frame stride for video inputs. Allows skipping frames in videos to speed up processing at the cost of temporal resolution. A value of 1 processes every frame, higher values skip frames.
stream_bufferboolFalseDetermines whether to queue incoming frames for video streams. If False, old frames get dropped to accommodate new frames (optimized for real-time applications). If True, queues new frames in a buffer, ensuring no frames get skipped, but will cause latency if inference FPS is lower than stream FPS.
visualizeboolFalseActivates visualization of model features during inference, providing insights into what the model is "seeing". Useful for debugging and model interpretation.
augmentboolFalseEnables test-time augmentation (TTA) for predictions, potentially improving detection robustness at the cost of inference speed.
agnostic_nmsboolFalseEnables class-agnostic Non-Maximum Suppression (NMS), which merges overlapping boxes of different classes. Useful in multi-class detection scenarios where class overlap is common. For end-to-end models (YOLO26, YOLOv10), this only prevents the same detection from appearing with multiple class labels (IoU=1.0 duplicates) and does not perform IoU-threshold-based suppression between distinct boxes.
classeslist[int]NoneFilters predictions to a set of class IDs. Only detections belonging to the specified classes will be returned. Useful for focusing on relevant objects in multi-class detection tasks.
retina_masksboolFalseReturns high-resolution segmentation masks. The returned masks (masks.data) will match the original image size if enabled. If disabled, they have the image size used during inference.
embedlist[int]NoneSpecifies the layers from which to extract feature vectors or embeddings. Useful for downstream tasks like clustering or similarity search.
projectstrNoneName of the project directory where prediction outputs are saved if save is enabled.
namestrNoneName of the prediction run. Used for creating a subdirectory within the project folder, where prediction outputs are stored if save is enabled.
streamboolFalseEnables memory-efficient processing for long videos or numerous images by returning a generator of Results objects instead of loading all frames into memory at once.
verboseboolTrueControls whether to display detailed inference logs in the terminal, providing real-time feedback on the prediction process.
compilebool or strFalseEnables PyTorch 2.x torch.compile graph compilation with backend='inductor'. Accepts True"default", False → disables, or a string mode such as "default", "reduce-overhead", "max-autotune-no-cudagraphs". Falls back to eager with a warning if unsupported.
end2endboolNoneOverrides the end-to-end mode in YOLO models that support NMS-free inference (YOLO26, YOLOv10). Setting it to False lets you run prediction using the traditional NMS pipeline, additionally allowing you to make use of the iou argument. See the End-to-End Detection guide for details.