Back to Swin Transformer

MODELHUB

MODELHUB.md

latest39.5 KB
Original Source

Access code for baidu is swin.

ImageNet-1K and ImageNet-22K Pretrained Swin-V1 Models

namepretrainresolutionacc@1acc@5#paramsFLOPsFPS22K model1K model
Swin-TImageNet-1K224x22481.295.528M4.5G755-github/baidu/config/log
Swin-SImageNet-1K224x22483.296.250M8.7G437-github/baidu/config/log
Swin-BImageNet-1K224x22483.596.588M15.4G278-github/baidu/config/log
Swin-BImageNet-1K384x38484.597.088M47.1G85-github/baidu/config
Swin-TImageNet-22K224x22480.996.028M4.5G755github/baidu/configgithub/baidu/config
Swin-SImageNet-22K224x22483.297.050M8.7G437github/baidu/configgithub/baidu/config
Swin-BImageNet-22K224x22485.297.588M15.4G278github/baidu/configgithub/baidu/config
Swin-BImageNet-22K384x38486.498.088M47.1G85github/baidugithub/baidu/config
Swin-LImageNet-22K224x22486.397.9197M34.5G141github/baidu/configgithub/baidu/config
Swin-LImageNet-22K384x38487.398.2197M103.9G42github/baidugithub/baidu/config

ImageNet-1K and ImageNet-22K Pretrained Swin-V2 Models

namepretrainresolutionwindowacc@1acc@5#paramsFLOPsFPS22K model1K model
SwinV2-TImageNet-1K256x2568x881.895.928M5.9G572-github/baidu/config
SwinV2-SImageNet-1K256x2568x883.796.650M11.5G327-github/baidu/config
SwinV2-BImageNet-1K256x2568x884.296.988M20.3G217-github/baidu/config
SwinV2-TImageNet-1K256x25616x1682.896.228M6.6G437-github/baidu/config
SwinV2-SImageNet-1K256x25616x1684.196.850M12.6G257-github/baidu/config
SwinV2-BImageNet-1K256x25616x1684.697.088M21.8G174-github/baidu/config
SwinV2-B<sup>*</sup>ImageNet-22K256x25616x1686.297.988M21.8G174github/baidu/configgithub/baidu/config
SwinV2-B<sup>*</sup>ImageNet-22K384x38424x2487.198.288M54.7G57github/baidu/configgithub/baidu/config
SwinV2-L<sup>*</sup>ImageNet-22K256x25616x1686.998.0197M47.5G95github/baidu/configgithub/baidu/config
SwinV2-L<sup>*</sup>ImageNet-22K384x38424x2487.698.3197M115.4G33github/baidu/configgithub/baidu/config

Note:

  • SwinV2-B<sup>*</sup> (SwinV2-L<sup>*</sup>) with input resolution of 256x256 and 384x384 both fine-tuned from the same pre-training model using a smaller input resolution of 192x192.
  • SwinV2-B<sup>*</sup> (384x384) achieves 78.08 acc@1 on ImageNet-1K-V2 while SwinV2-L<sup>*</sup> (384x384) achieves 78.31.

ImageNet-1K Pretrained Swin MLP Models

namepretrainresolutionacc@1acc@5#paramsFLOPsFPS1K model
Mixer-B/16ImageNet-1K224x22476.4-59M12.7G-official repo
ResMLP-S24ImageNet-1K224x22479.4-30M6.0G715timm
ResMLP-B24ImageNet-1K224x22481.0-116M23.0G231timm
Swin-T/C24ImageNet-1K256x25681.695.728M5.9G563github/baidu/config
SwinMLP-T/C24ImageNet-1K256x25679.494.620M4.0G807github/baidu/config
SwinMLP-T/C12ImageNet-1K256x25679.694.721M4.0G792github/baidu/config
SwinMLP-T/C6ImageNet-1K256x25679.794.923M4.0G766github/baidu/config
SwinMLP-BImageNet-1K224x22481.395.361M10.4G409github/baidu/config

Note: C24 means each head has 24 channels.

ImageNet-22K Pretrained Swin-MoE Models

name#expertskrouterresolutionwindowIN-22K acc@1IN-1K/ft acc@1IN-1K/5-shot acc@122K model
Swin-MoE-S1 (dense)--192x1928x835.583.570.3github/baidu/config
Swin-MoE-S81Linear192x1928x836.884.575.2github/baidu/config
Swin-MoE-S161Linear192x1928x837.684.976.5github/baidu/config
Swin-MoE-S321Linear192x1928x837.484.775.9github/baidu/config
Swin-MoE-S321Cosine192x1928x837.284.375.2github/baidu/config
Swin-MoE-S641Linear192x1928x837.884.775.7-
Swin-MoE-S1281Linear192x1928x837.484.575.4-
Swin-MoE-B1 (dense)--192x1928x837.385.175.9config
Swin-MoE-B81Linear192x1928x838.185.377.2config
Swin-MoE-B161Linear192x1928x838.785.578.2config
Swin-MoE-B321Linear192x1928x838.685.577.9config
Swin-MoE-B321Cosine192x1928x838.585.377.3config
Swin-MoE-B322Linear192x1928x838.685.578.7-

SimMIM Pretrained Swin-V2 Models

Please note that all SimMIM pretrained Swin-V2 models will be stored in the Huggingface repository starting July 2024. For more details, refer to the huggingface repository.

  • Model size only includes the backbone weights and excludes weights in the decoders/classification heads.
  • Batch size for all models is set to 2048.
  • Validation loss is calculated on the ImageNet-1K validation set.
  • Fine-tuned acc@1 refers to the top-1 accuracy on the ImageNet-1K validation set after fine-tuning.
namemodel sizepre-train datasetpre-train iterationsvalidation lossfine-tuned acc@1pre-trained modelfine-tuned model
SwinV2-Small49MImageNet-1K 10%125k0.482082.69huggingfacehuggingface
SwinV2-Small49MImageNet-1K 10%250k0.496183.11huggingfacehuggingface
SwinV2-Small49MImageNet-1K 10%500k0.511583.17huggingfacehuggingface
SwinV2-Small49MImageNet-1K 20%125k0.475183.05huggingfacehuggingface
SwinV2-Small49MImageNet-1K 20%250k0.472283.56huggingfacehuggingface
SwinV2-Small49MImageNet-1K 20%500k0.473483.75huggingfacehuggingface
SwinV2-Small49MImageNet-1K 50%125k0.473283.04huggingfacehuggingface
SwinV2-Small49MImageNet-1K 50%250k0.468183.67huggingfacehuggingface
SwinV2-Small49MImageNet-1K 50%500k0.464683.96huggingfacehuggingface
SwinV2-Small49MImageNet-1K125k0.472882.92huggingfacehuggingface
SwinV2-Small49MImageNet-1K250k0.467483.66huggingfacehuggingface
SwinV2-Small49MImageNet-1K500k0.464184.08huggingfacehuggingface
SwinV2-Base87MImageNet-1K 10%125k0.482283.33huggingfacehuggingface
SwinV2-Base87MImageNet-1K 10%250k0.499783.60huggingfacehuggingface
SwinV2-Base87MImageNet-1K 10%500k0.511283.41huggingfacehuggingface
SwinV2-Base87MImageNet-1K 20%125k0.470383.86huggingfacehuggingface
SwinV2-Base87MImageNet-1K 20%250k0.467984.37huggingfacehuggingface
SwinV2-Base87MImageNet-1K 20%500k0.471184.61huggingfacehuggingface
SwinV2-Base87MImageNet-1K 50%125k0.468384.04huggingfacehuggingface
SwinV2-Base87MImageNet-1K 50%250k0.463384.57huggingfacehuggingface
SwinV2-Base87MImageNet-1K 50%500k0.459884.95huggingfacehuggingface
SwinV2-Base87MImageNet-1K125k0.468084.13huggingfacehuggingface
SwinV2-Base87MImageNet-1K250k0.462684.65huggingfacehuggingface
SwinV2-Base87MImageNet-1K500k0.458885.04huggingfacehuggingface
SwinV2-Base87MImageNet-22K125k0.469584.11huggingfacehuggingface
SwinV2-Base87MImageNet-22K250k0.464984.57huggingfacehuggingface
SwinV2-Base87MImageNet-22K500k0.461485.11huggingfacehuggingface
SwinV2-Large195MImageNet-1K 10%125k0.499583.69huggingfacehuggingface
SwinV2-Large195MImageNet-1K 10%250k0.514083.66huggingfacehuggingface
SwinV2-Large195MImageNet-1K 10%500k0.515083.50huggingfacehuggingface
SwinV2-Large195MImageNet-1K 20%125k0.467584.38huggingfacehuggingface
SwinV2-Large195MImageNet-1K 20%250k0.474684.71huggingfacehuggingface
SwinV2-Large195MImageNet-1K 20%500k0.496084.59huggingfacehuggingface
SwinV2-Large195MImageNet-1K 50%125k0.462284.78huggingfacehuggingface
SwinV2-Large195MImageNet-1K 50%250k0.456685.38huggingfacehuggingface
SwinV2-Large195MImageNet-1K 50%500k0.453085.80huggingfacehuggingface
SwinV2-Large195MImageNet-1K125k0.461184.98huggingfacehuggingface
SwinV2-Large195MImageNet-1K250k0.455285.45huggingfacehuggingface
SwinV2-Large195MImageNet-1K500k0.450785.91huggingfacehuggingface
SwinV2-Large195MImageNet-22K125k0.464984.61huggingfacehuggingface
SwinV2-Large195MImageNet-22K250k0.458685.39huggingfacehuggingface
SwinV2-Large195MImageNet-22K500k0.453685.81huggingfacehuggingface
SwinV2-Huge655MImageNet-1K 20%125k0.478984.35huggingfacehuggingface
SwinV2-Huge655MImageNet-1K 20%250k0.503884.16huggingfacehuggingface
SwinV2-Huge655MImageNet-1K 20%500k0.507183.44huggingfacehuggingface
SwinV2-Huge655MImageNet-1K 50%125k0.454985.09huggingfacehuggingface
SwinV2-Huge655MImageNet-1K 50%250k0.451185.64huggingfacehuggingface
SwinV2-Huge655MImageNet-1K 50%500k0.455985.69huggingfacehuggingface
SwinV2-Huge655MImageNet-1K125k0.453185.23huggingfacehuggingface
SwinV2-Huge655MImageNet-1K250k0.446485.90huggingfacehuggingface
SwinV2-Huge655MImageNet-1K500k0.441686.34huggingfacehuggingface
SwinV2-Huge655MImageNet-22K125k0.456485.14huggingfacehuggingface
SwinV2-Huge655MImageNet-22K250k0.449985.86huggingfacehuggingface
SwinV2-Huge655MImageNet-22K500k0.444486.27huggingfacehuggingface
SwinV2-giant1.06BImageNet-1K 50%125k0.453485.44huggingfacehuggingface
SwinV2-giant1.06BImageNet-1K 50%250k0.451585.76huggingfacehuggingface
SwinV2-giant1.06BImageNet-1K 50%500k0.471985.51huggingfacehuggingface
SwinV2-giant1.06BImageNet-1K125k0.451385.57huggingfacehuggingface
SwinV2-giant1.06BImageNet-1K250k0.444286.12huggingfacehuggingface
SwinV2-giant1.06BImageNet-1K500k0.439586.46huggingfacehuggingface
SwinV2-giant1.06BImageNet-22K125k0.454485.39huggingfacehuggingface
SwinV2-giant1.06BImageNet-22K250k0.447585.96huggingfacehuggingface
SwinV2-giant1.06BImageNet-22K500k0.441686.53huggingfacehuggingface

SimMIM Pretrained Swin-V1 Models

ImageNet-1K Pre-trained and Fine-tuned Models

namepre-train epochspre-train resolutionfine-tune resolutionacc@1pre-trained modelfine-tuned model
Swin-Base100192x192192x19282.8google/configgoogle/config
Swin-Base100192x192224x22483.5google/configgoogle/config
Swin-Base800192x192224x22484.0google/configgoogle/config
Swin-Large800192x192224x22485.4google/configgoogle/config
SwinV2-Huge800192x192224x22485.7//
SwinV2-Huge800192x192512x51287.1//