Back to Ultralytics

Export Args

docs/macros/export-args.md

8.4.8012.9 KB
Original Source
ArgumentTypeDefaultDescription
formatstr'torchscript'Target format for the exported model, such as 'onnx', 'torchscript', 'engine' (TensorRT), or others. Each format enables compatibility with different deployment environments.
imgszint or tuple640Desired image size for the model input. Can be an integer for square images (e.g., 640 for 640×640) or a tuple (height, width) for specific dimensions.
kerasboolFalseEnables export to Keras format for TensorFlow SavedModel, providing compatibility with TensorFlow serving and APIs.
optimizeboolFalseApplies optimization for mobile devices when exporting to TorchScript, potentially reducing model size and improving inference performance. Not compatible with NCNN format or CUDA devices. For DEEPX, enables a higher compiler optimization which reduces inference latency and increases compilation time.
quantizeint or strNoneQuantization precision: 16 (FP16, reduces model size and can speed up inference on supported hardware) or 8 (INT8/PTQ, further compresses the model with minimal accuracy loss, primarily for edge devices; needs calibration data/fraction); 32/unset is FP32. Export formats that support mixed weight/activation precision also accept the 'w8a8'/'w16a16'/'w8a16' notation. Replaces the deprecated half/int8 flags (half=True16, int8=True8, still accepted with a deprecation warning). Only precisions supported by the target format are allowed (see below).
dynamicboolFalseAllows dynamic input sizes for TorchScript, ONNX, OpenVINO, TensorRT, and CoreML exports, enhancing flexibility in handling varying image dimensions.
simplifyboolTrueSimplifies the model graph for ONNX exports with onnxslim, potentially improving performance and compatibility with inference engines.
opsetintNoneSpecifies the ONNX opset version for compatibility with different ONNX parsers and runtimes. If not set, uses the latest supported version.
workspacefloat or NoneNoneSets the maximum workspace size in GiB for TensorRT optimizations, balancing memory usage and performance. Use None for auto-allocation by TensorRT up to device maximum.
nmsboolFalseAdds Non-Maximum Suppression (NMS) to the exported model when supported (see Export Formats), improving detection post-processing efficiency. Not available for end2end models.
batchint1Specifies export model batch inference size or the maximum number of images the exported model will process concurrently in predict mode. For Edge TPU exports, this is automatically set to 1.
devicestrNoneSpecifies the device for exporting: GPU (device=0), CPU (device=cpu), MPS for Apple silicon (device=mps), Huawei Ascend NPU (device=npu or device=npu:0), or DLA for NVIDIA Jetson (device=dla:0 or device=dla:1). TensorRT exports automatically use GPU, but TensorRT 11.0 does not support DLA.
datastrNonePath to the dataset configuration file, essential for INT8 quantization calibration. If not specified with INT8 enabled, Ultralytics selects a task-specific calibration dataset where required, or falls back to the default dataset for the model task.
fractionfloat1.0Specifies the fraction of the dataset to use for INT8 quantization calibration. Allows for calibrating on a subset of the full dataset, useful for experiments or when resources are limited. If not specified with INT8 enabled, the full dataset will be used.
end2endboolNoneOverrides the end-to-end mode in YOLO models that support NMS-free inference (YOLO26, YOLOv10). Setting it to False lets you export these models to be compatible with the traditional NMS-based postprocessing pipeline. See the End-to-End Detection guide for details.