Back to Tensorrtx

YOLOv9

yolov9/README.md

latest3.9 KB
Original Source

YOLOv9

The Pytorch implementation is WongKinYiu/yolov9.

Contributors

<a href="https://github.com/WuxinrongY"></a>

Progress

  • YOLOv9-t
  • YOLOv9-t-convert(gelan)
  • YOLOv9-s
  • YOLOv9-s-convert(gelan)
  • YOLOv9-m
  • YOLOv9-m-convert(gelan)
  • YOLOv9-c
  • YOLOv9-c-convert(gelan)
  • YOLOv9-e
  • YOLOv9-e-convert(gelan)

Requirements

  • TensorRT 8.0+
  • OpenCV 3.4.0+

Speed Test

The speed test is done on a desktop with R7-5700G CPU and RTX 4060Ti GPU. The input size is 640x640. The FP32, FP16 and INT8 models are tested. The time only includes the inference time, not includes the pre-processing and post-processing. The time is the average of 1000 times inference.

frameModelFP32FP16INT8
tensorrtYOLOv5-n-ms0.58ms-ms
tensorrtYOLOv5-s-ms0.90ms-ms
tensorrtYOLOv5-m-ms1.9ms-ms
tensorrtYOLOv5-l-ms2.8ms-ms
tensorrtYOLOv5-x-ms5.1ms-ms
tensorrtYOLOv9-t-convert-ms1.37ms-ms
tensorrtYOLOv9-s-ms1.78ms-ms
tensorrtYOLOv9-s-convert-ms1.78ms-ms
tensorrtYOLOv9-m-ms3.1ms-ms
tensorrtYOLOv9-m-convert-ms2.8ms-ms
tensorrtYOLOv9-c13.5ms4.6ms3.0ms
tensorrtYOLOv9-e8.3ms3.2ms2.15ms

GELAN will be updated later.

YOLOv9-e is faster than YOLOv9-c in tensorrt, because the YOLOv9-e requires fewer layers of inference.

YOLOv9-c:
[[31, 34, 37, 16, 19, 22], 1, DualDDetect, [nc]] # [A3, A4, A5, P3, P4, P5]

YOLOv9-e:
[[35, 32, 29, 42, 45, 48], 1, DualDDetect, [nc]]

In DualDDetect, the A3, A4, A5, P3, P4, P5 are the output of the backbone. The first 3 layers are used for the inference of the final result.

The YOLOv9-c requires 37 layers of inference, but YOLOv9-e requires 35 layers of inference.

How to Run, yolov9 as example

  1. generate .wts from pytorch with .pt, or download .wts from model zoo
// download https://github.com/WongKinYiu/yolov9
cp {tensorrtx}/yolov9/gen_wts.py {yolov9}/yolov9
cd {yolov9}/yolov9
python gen_wts.py
// a file 'yolov9.wts' will be generated.
  1. build tensorrtx/yolov9 and run
cd {tensorrtx}/yolov9/
// update kNumClass in config.h if your model is trained on custom dataset
mkdir build
cd build
cp {ultralytics}/ultralytics/yolov9.wts {tensorrtx}/yolov9/build
cmake ..
make
sudo ./yolov9 -s [.wts] [.engine] [c/e]  // serialize model to plan file
sudo ./yolov9 -d [.engine] [image folder] // deserialize and run inference, the images in [image folder] will be processed.
// For example yolov9
sudo ./yolov9 -s yolov9-c.wts yolov9-c.engine c
sudo ./yolov9 -d yolov9-c.engine ../images
  1. check the images generated, as follows. _zidane.jpg and _bus.jpg

  2. optional, load and run the tensorrt model in python

// install python-tensorrt, pycuda, etc.
// ensure the yolov9.engine and libmyplugins.so have been built
python yolov9_trt.py

INT8 Quantization

  1. Prepare calibration images, you can randomly select 1000s images from your train set. For coco, you can also download my calibration images coco_calib from GoogleDrive or BaiduPan pwd: a9wh

  2. unzip it in yolov9/build

  3. set the macro USE_INT8 in config.h and change the path of calibration images in config.h, such as 'gCalibTablePath="./coco_calib/";'

  4. serialize the model and test

<p align="center"> </p>

More Information

See the readme in home page.