Export Args - Ultralytics

Argument	Type	Default	Description
`format`	`str`	`'torchscript'`	Target format for the exported model, such as `'onnx'`, `'torchscript'`, `'engine'` (TensorRT), or others. Each format enables compatibility with different deployment environments.
`imgsz`	`int` or `tuple`	`640`	Desired image size for the model input. Can be an integer for square images (e.g., `640` for 640×640) or a tuple `(height, width)` for specific dimensions.
`keras`	`bool`	`False`	Enables export to Keras format for TensorFlow SavedModel, providing compatibility with TensorFlow serving and APIs.
`optimize`	`bool`	`False`	Applies optimization for mobile devices when exporting to TorchScript, potentially reducing model size and improving inference performance. Not compatible with NCNN format or CUDA devices. For DEEPX, enables a higher compiler optimization which reduces inference latency and increases compilation time.
`quantize`	`int` or `str`	`None`	Quantization precision: `16` (FP16, reduces model size and can speed up inference on supported hardware) or `8` (INT8/PTQ, further compresses the model with minimal accuracy loss, primarily for edge devices; needs calibration `data`/`fraction`); `32`/unset is FP32. Export formats that support mixed weight/activation precision also accept the `'w8a8'`/`'w16a16'`/`'w8a16'` notation. Replaces the deprecated `half`/`int8` flags (`half=True` → `16`, `int8=True` → `8`, still accepted with a deprecation warning). Only precisions supported by the target format are allowed (see below).
`dynamic`	`bool`	`False`	Allows dynamic input sizes for TorchScript, ONNX, OpenVINO, TensorRT, and CoreML exports, enhancing flexibility in handling varying image dimensions.
`simplify`	`bool`	`True`	Simplifies the model graph for ONNX exports with `onnxslim`, potentially improving performance and compatibility with inference engines.
`opset`	`int`	`None`	Specifies the ONNX opset version for compatibility with different ONNX parsers and runtimes. If not set, uses the latest supported version.
`workspace`	`float` or `None`	`None`	Sets the maximum workspace size in GiB for TensorRT optimizations, balancing memory usage and performance. Use `None` for auto-allocation by TensorRT up to device maximum.
`nms`	`bool`	`False`	Adds Non-Maximum Suppression (NMS) to the exported model when supported (see Export Formats), improving detection post-processing efficiency. Not available for end2end models.
`batch`	`int`	`1`	Specifies export model batch inference size or the maximum number of images the exported model will process concurrently in `predict` mode. For Edge TPU exports, this is automatically set to 1.
`device`	`str`	`None`	Specifies the device for exporting: GPU (`device=0`), CPU (`device=cpu`), MPS for Apple silicon (`device=mps`), Huawei Ascend NPU (`device=npu` or `device=npu:0`), or DLA for NVIDIA Jetson (`device=dla:0` or `device=dla:1`). TensorRT exports automatically use GPU, but TensorRT 11.0 does not support DLA.
`data`	`str`	`None`	Path to the dataset configuration file, essential for INT8 quantization calibration. If not specified with INT8 enabled, Ultralytics selects a task-specific calibration dataset where required, or falls back to the default dataset for the model task.
`fraction`	`float`	`1.0`	Specifies the fraction of the dataset to use for INT8 quantization calibration. Allows for calibrating on a subset of the full dataset, useful for experiments or when resources are limited. If not specified with INT8 enabled, the full dataset will be used.
`end2end`	`bool`	`None`	Overrides the end-to-end mode in YOLO models that support NMS-free inference (YOLO26, YOLOv10). Setting it to `False` lets you export these models to be compatible with the traditional NMS-based postprocessing pipeline. See the End-to-End Detection guide for details.