Predict Args - Ultralytics

Argument	Type	Default	Description
`source`	`str` or `int` or `None`	`None`	Specifies the data source for inference. Can be an image path, video file, directory, URL, or device ID for live feeds. If omitted, a warning is logged and the model falls back to the built-in demo assets (`ultralytics/assets`, or a demo URL for OBB). Supports a wide range of formats and sources, enabling flexible application across different types of input.
`conf`	`float`	`0.25`	Sets the minimum confidence threshold for detections. Objects detected with confidence below this threshold will be disregarded. Adjusting this value can help reduce false positives.
`iou`	`float`	`0.7`	Intersection Over Union (IoU) threshold for Non-Maximum Suppression (NMS). Lower values result in fewer detections by eliminating overlapping boxes, useful for reducing duplicates.
`imgsz`	`int` or `tuple`	`640`	Defines the image size for inference. Can be a single integer `640` for square resizing or a (height, width) tuple. Proper sizing can improve detection accuracy and processing speed.
`rect`	`bool`	`True`	If enabled, minimally pads the shorter side of the image until it's divisible by stride to improve inference speed. If disabled, pads the image to a square during inference.
`half`	`bool`	`False`	Enables half-precision (FP16) inference, which can speed up model inference on supported GPUs with minimal impact on accuracy.
`device`	`str`	`None`	Specifies the device for inference (e.g., `cpu`, `cuda:0`, `0`, `npu` or `npu:0`). Allows users to select between CPU, a specific GPU, Huawei Ascend NPU, or other compute devices for model execution.
`batch`	`int`	`1`	Specifies the batch size for inference (only works when the source is a directory, video file, or `.txt` file). A larger batch size can provide higher throughput, shortening the total amount of time required for inference.
`max_det`	`int`	`300`	Maximum number of detections allowed per image. Limits the total number of objects the model can detect in a single inference, preventing excessive outputs in dense scenes.
`vid_stride`	`int`	`1`	Frame stride for video inputs. Allows skipping frames in videos to speed up processing at the cost of temporal resolution. A value of 1 processes every frame, higher values skip frames.
`stream_buffer`	`bool`	`False`	Determines whether to queue incoming frames for video streams. If `False`, old frames get dropped to accommodate new frames (optimized for real-time applications). If `True`, queues new frames in a buffer, ensuring no frames get skipped, but will cause latency if inference FPS is lower than stream FPS.
`visualize`	`bool`	`False`	Activates visualization of model features during inference, providing insights into what the model is "seeing". Useful for debugging and model interpretation.
`augment`	`bool`	`False`	Enables test-time augmentation (TTA) for predictions, potentially improving detection robustness at the cost of inference speed.
`agnostic_nms`	`bool`	`False`	Enables class-agnostic Non-Maximum Suppression (NMS), which merges overlapping boxes of different classes. Useful in multi-class detection scenarios where class overlap is common. For end-to-end models (YOLO26, YOLOv10), this only prevents the same detection from appearing with multiple class labels (IoU=1.0 duplicates) and does not perform IoU-threshold-based suppression between distinct boxes.
`classes`	`list[int]`	`None`	Filters predictions to a set of class IDs. Only detections belonging to the specified classes will be returned. Useful for focusing on relevant objects in multi-class detection tasks.
`retina_masks`	`bool`	`False`	Returns high-resolution segmentation masks. The returned masks (`masks.data`) will match the original image size if enabled. If disabled, they have the image size used during inference.
`embed`	`list[int]`	`None`	Specifies the layers from which to extract feature vectors or embeddings. Useful for downstream tasks like clustering or similarity search.
`project`	`str`	`None`	Name of the project directory where prediction outputs are saved if `save` is enabled.
`name`	`str`	`None`	Name of the prediction run. Used for creating a subdirectory within the project folder, where prediction outputs are stored if `save` is enabled.
`stream`	`bool`	`False`	Enables memory-efficient processing for long videos or numerous images by returning a generator of Results objects instead of loading all frames into memory at once.
`verbose`	`bool`	`True`	Controls whether to display detailed inference logs in the terminal, providing real-time feedback on the prediction process.
`compile`	`bool` or `str`	`False`	Enables PyTorch 2.x `torch.compile` graph compilation with `backend='inductor'`. Accepts `True` → `"default"`, `False` → disables, or a string mode such as `"default"`, `"reduce-overhead"`, `"max-autotune-no-cudagraphs"`. Falls back to eager with a warning if unsupported.
`end2end`	`bool`	`None`	Overrides the end-to-end mode in YOLO models that support NMS-free inference (YOLO26, YOLOv10). Setting it to `False` lets you run prediction using the traditional NMS pipeline, additionally allowing you to make use of the `iou` argument. See the End-to-End Detection guide for details.