Back to Fastertransformer

Swin Transformer Quantization Toolkit

examples/pytorch/swin/Swin-Transformer-Quantization/README.md

latest9.1 KB
Original Source

Swin Transformer Quantization Toolkit

This folder contains the guidance for Swin Transformer Quantization Toolkit.

Model Zoo

Regular ImageNet-1K trained models

nameresolutionacc@1acc@5#paramsFLOPsmodel
Swin-T224x22481.295.528M4.5Ggithub/baidu
Swin-S224x22483.296.250M8.7Ggithub/baidu
Swin-B224x22483.596.588M15.4Ggithub/baidu
Swin-B384x38484.597.088M47.1Ggithub/baidu

ImageNet-22K pre-trained models

nameresolutionacc@1acc@5#paramsFLOPs22K model1K model
Swin-B224x22485.297.588M15.4Ggithub/baidugithub/baidu
Swin-B384x38486.498.088M47.1Ggithub/baidugithub/baidu
Swin-L224x22486.397.9197M34.5Ggithub/baidugithub/baidu
Swin-L384x38487.398.2197M103.9Ggithub/baidugithub/baidu

Note: access code for baidu is swin.

Usage

Environment setup

  • Initialize submodule

    bash
    git submodule update --init
    
  • Run the container.

    You can choose pytorch version you want. Here, we list some possible images:

    • nvcr.io/nvidia/pytorch:22.09-py3 contains the PyTorch 1.13.0 and python 3.8
  • Install additional dependencies (not included by container)

    bash
    pip install timm==0.4.12
    pip install termcolor==1.1.0
    

Data preparation

We use standard ImageNet dataset, you can download it from http://image-net.org/. We provide the following two ways to load data:

  • For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:

    bash
    $ tree data
    imagenet
    ├── train
    │   ├── class1
    │   │   ├── img1.jpeg
    │   │   ├── img2.jpeg
    │   │   └── ...
    │   ├── class2
    │   │   ├── img3.jpeg
    │   │   └── ...
    │   └── ...
    └── val
        ├── class1
        │   ├── img4.jpeg
        │   ├── img5.jpeg
        │   └── ...
        ├── class2
        │   ├── img6.jpeg
        │   └── ...
        └── ...
    
    
  • To boost the slow speed when reading images from massive small files, we also support zipped ImageNet, which includes four files:

    • train.zip, val.zip: which store the zipped folder for train and validate splits.
    • train_map.txt, val_map.txt: which store the relative path in the corresponding zip file and ground truth label. Make sure the data folder looks like this:
    bash
    $ tree data
    data
    └── ImageNet-Zip
        ├── train_map.txt
        ├── train.zip
        ├── val_map.txt
        └── val.zip
    
    $ head -n 5 data/ImageNet-Zip/val_map.txt
    ILSVRC2012_val_00000001.JPEG	65
    ILSVRC2012_val_00000002.JPEG	970
    ILSVRC2012_val_00000003.JPEG	230
    ILSVRC2012_val_00000004.JPEG	809
    ILSVRC2012_val_00000005.JPEG	516
    
    $ head -n 5 data/ImageNet-Zip/train_map.txt
    n01440764/n01440764_10026.JPEG	0
    n01440764/n01440764_10027.JPEG	0
    n01440764/n01440764_10029.JPEG	0
    n01440764/n01440764_10040.JPEG	0
    n01440764/n01440764_10042.JPEG	0
    

Calibration

To calibrate and then evaluate a calibrated Swin Transformer on ImageNet val, run:

bash
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> \
  --master_port 12345 main.py \
  --calib \
  --cfg <config-file> \
  --resume <checkpoint> \
  --data-path <imagenet-path> \
  --num-calib-batch <batch-number> \
  --calib-batchsz <batch-size> \
  --int8-mode <mode>\
  --calib-output-path <output-path> \ 

For example, to calibrate the Swin-T with a single GPU: (For calibration, we only support using one single GPU). You can see calib.sh for reference.

bash
python -m torch.distributed.launch --nproc_per_node 1 \
  --master_port 12345 main.py \
  --calib \
  --cfg SwinTransformer/configs/swin_tiny_patch4_window7_224.yaml \
  --resume swin_tiny_patch4_window7_224.pth \
  --data-path <imagenet-path> \ 
  --num-calib-batch 10 \
  --calib-batchsz 8\
  --int8-mode 1\
  --calib-output-path calib-checkpoint

Difference between --int8-mode 1 and --int8-mode 2

nameresolutionOriginal AccuracyPTQ(mode=1)QAT(mode=1)
Swin-T224x22481.18%80.75%(-0.43%)81.00%(-0.18%)
Swin-S224x22483.21%82.90%(-0.31%)83.00%(-0.21%)
Swin-B224x22483.42%83.10%(-0.32%)83.42%(-0.00%)
Swin-B384x38484.47%84.05%(-0.42%)84.16%(-0.31%)
Swin-L224x22486.25%83.53%(-2.72%)86.12%(-0.13%)
Swin-L384x38487.25%83.10%(-4.15%)87.11%(-0.14%)

For Swin-T/S/B, set --int8-mode 1 suffices to get negligible accuracy loss for both PTQ/QAT. However, for Swin-L, --int8-mode 1 cannot get a satisfactory result for PTQ accuracy. This is due to that --int8-mode 1 means all GEMM outputs(INT32) are quantized to INT8, and in order to improve PTQ performance some GEMM output quantization have to be disabled. --int8-mode 2 means quantization of fc2 and PatchMerge outputs are disabled. The result is as follows:

nameresolutionOriginal AccuracyPTQ(mode=1)PTQ(mode=2)
Swin-L224x22486.25%83.53%(-2.72%)85.93%(-0.32%)
Swin-L384x38487.25%83.10%(-4.15%)86.92%(-0.33%)

Evaluation a Calibrated model

To evaluate a pre-calibrated Swin Transformer on ImageNet val, run:

bash
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> \
  --master_port 12345 main.py \
  --eval \
  --cfg <config-file> \
  --resume <calibrated-checkpoint> \
  --data-path <imagenet-path> \
  --int8-mode <mode> \
  --batch-size <batch-size>

For example, to evaluate the Swin-T with a single GPU. You can see run.sh for reference.

bash
python -m torch.distributed.launch --nproc_per_node 1 \
  --master_port 12345 main.py \
  --eval \
  --cfg SwinTransformer/configs/swin_tiny_patch4_window7_224.yaml \
  --resume ./calib-checkpoint/swin_tiny_patch4_window7_224_calib.pth \
  --data-path <imagenet-path> \
  --int8-mode 1\
  --batch-size 128

Quantization Aware Training (QAT)

To run QAT with Swin Transformer, run:

bash
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> \
    --master_port 12345 main.py \
    --train \
    --cfg <config-file> \
    --resume <calibrated-checkpoint> \
    --data-path <imagenet-path> \
    --quant-mode <mode> \
    --teacher <uncalibrated-checkpoint> \
    --output <qat-output-path> \
    --distill \
    --int8-mode <mode>\
    --batch-size <batch-size> \
    --num-epochs <num-of-epochs> \
    --qat-lr <learning-rate-of-QAT>

For example, to do QAT with Swin Transformer by 4 GPU on a single node for 5 epochs, run: (You can see qat.sh for reference.)

bash
python -m torch.distributed.launch --nproc_per_node 4 \
    --master_port 12345 main.py \
    --train \
    --cfg SwinTransformer/configs/swin_tiny_patch4_window7_224.yaml \
    --resume ./calib-checkpoint/swin_tiny_patch4_window7_224_calib.pth \
    --data-path /data/datasets/ILSVRC2012 \
    --quant-mode ft2 \
    --teacher swin_tiny_patch4_window7_224.pth \
    --output qat-output \
    --distill \
    --int8-mode 1\
    --batch-size 128 \
    --num-epochs 5 \
    --qat-lr 1e-5