docs/macros/predict-args.md
| Argument | Type | Default | Description |
|---|---|---|---|
source | str or int or None | None | Specifies the data source for inference. Can be an image path, video file, directory, URL, or device ID for live feeds. If omitted, a warning is logged and the model falls back to the built-in demo assets (ultralytics/assets, or a demo URL for OBB). Supports a wide range of formats and sources, enabling flexible application across different types of input. |
conf | float | 0.25 | Sets the minimum confidence threshold for detections. Objects detected with confidence below this threshold will be disregarded. Adjusting this value can help reduce false positives. |
iou | float | 0.7 | Intersection Over Union (IoU) threshold for Non-Maximum Suppression (NMS). Lower values result in fewer detections by eliminating overlapping boxes, useful for reducing duplicates. |
imgsz | int or tuple | 640 | Defines the image size for inference. Can be a single integer 640 for square resizing or a (height, width) tuple. Proper sizing can improve detection accuracy and processing speed. |
rect | bool | True | If enabled, minimally pads the shorter side of the image until it's divisible by stride to improve inference speed. If disabled, pads the image to a square during inference. |
half | bool | False | Enables half-precision (FP16) inference, which can speed up model inference on supported GPUs with minimal impact on accuracy. |
device | str | None | Specifies the device for inference (e.g., cpu, cuda:0, 0, npu or npu:0). Allows users to select between CPU, a specific GPU, Huawei Ascend NPU, or other compute devices for model execution. |
batch | int | 1 | Specifies the batch size for inference (only works when the source is a directory, video file, or .txt file). A larger batch size can provide higher throughput, shortening the total amount of time required for inference. |
max_det | int | 300 | Maximum number of detections allowed per image. Limits the total number of objects the model can detect in a single inference, preventing excessive outputs in dense scenes. |
vid_stride | int | 1 | Frame stride for video inputs. Allows skipping frames in videos to speed up processing at the cost of temporal resolution. A value of 1 processes every frame, higher values skip frames. |
stream_buffer | bool | False | Determines whether to queue incoming frames for video streams. If False, old frames get dropped to accommodate new frames (optimized for real-time applications). If True, queues new frames in a buffer, ensuring no frames get skipped, but will cause latency if inference FPS is lower than stream FPS. |
visualize | bool | False | Activates visualization of model features during inference, providing insights into what the model is "seeing". Useful for debugging and model interpretation. |
augment | bool | False | Enables test-time augmentation (TTA) for predictions, potentially improving detection robustness at the cost of inference speed. |
agnostic_nms | bool | False | Enables class-agnostic Non-Maximum Suppression (NMS), which merges overlapping boxes of different classes. Useful in multi-class detection scenarios where class overlap is common. For end-to-end models (YOLO26, YOLOv10), this only prevents the same detection from appearing with multiple class labels (IoU=1.0 duplicates) and does not perform IoU-threshold-based suppression between distinct boxes. |
classes | list[int] | None | Filters predictions to a set of class IDs. Only detections belonging to the specified classes will be returned. Useful for focusing on relevant objects in multi-class detection tasks. |
retina_masks | bool | False | Returns high-resolution segmentation masks. The returned masks (masks.data) will match the original image size if enabled. If disabled, they have the image size used during inference. |
embed | list[int] | None | Specifies the layers from which to extract feature vectors or embeddings. Useful for downstream tasks like clustering or similarity search. |
project | str | None | Name of the project directory where prediction outputs are saved if save is enabled. |
name | str | None | Name of the prediction run. Used for creating a subdirectory within the project folder, where prediction outputs are stored if save is enabled. |
stream | bool | False | Enables memory-efficient processing for long videos or numerous images by returning a generator of Results objects instead of loading all frames into memory at once. |
verbose | bool | True | Controls whether to display detailed inference logs in the terminal, providing real-time feedback on the prediction process. |
compile | bool or str | False | Enables PyTorch 2.x torch.compile graph compilation with backend='inductor'. Accepts True → "default", False → disables, or a string mode such as "default", "reduce-overhead", "max-autotune-no-cudagraphs". Falls back to eager with a warning if unsupported. |
end2end | bool | None | Overrides the end-to-end mode in YOLO models that support NMS-free inference (YOLO26, YOLOv10). Setting it to False lets you run prediction using the traditional NMS pipeline, additionally allowing you to make use of the iou argument. See the End-to-End Detection guide for details. |