official/vision/README.md
⚠️ Disclaimer: Checkpoints are based on training with publicly available datasets. Some datasets contain limitations, including non-commercial use limitations. Please review the terms and conditions made available by third parties before using the datasets provided. Checkpoints are licensed under Apache 2.0.
⚠️ Disclaimer: Datasets hyperlinked from this page are not owned or distributed by Google. Such datasets are made available by third parties. Please review the terms and conditions made available by the third parties before using the data.
TF-Vision modeling library for computer vision provides a collection of baselines and checkpoints for image classification, object detection, and segmentation.
| Backbones |
|---|
| DilatedResNet |
| EfficientNet |
| MobileDet |
| MobileNet |
| ResNet |
| ResNet3D |
| RevNet |
| SpineNet |
| SpineNetMobile |
| VisionTransformer |
| Decoders |
|---|
| ASPP |
| FPN |
| NASFPN |
| Heads |
|---|
| DetectionHead |
| MaskHead |
| MaskScoring |
| RPNHead |
| RetinaNetHead |
| SegmentationHead |
| Model | Resolution | Epochs | Top-1 | Top-5 | Download |
|---|---|---|---|---|---|
| ResNet-50 | 224x224 | 90 | 76.1 | 92.9 | config |
| ResNet-50 | 224x224 | 200 | 77.1 | 93.5 | config | ckpt |
| ResNet-101 | 224x224 | 200 | 78.3 | 94.2 | config | ckpt |
| ResNet-152 | 224x224 | 200 | 78.7 | 94.3 | config | ckpt |
We support state-of-the-art ResNet-RS image classification models with features:
| Model | Resolution | Params (M) | Top-1 | Top-5 | Download |
|---|---|---|---|---|---|
| ResNet-RS-50 | 160x160 | 35.7 | 79.1 | 94.5 | config | ckpt |
| ResNet-RS-101 | 160x160 | 63.7 | 80.2 | 94.9 | config | ckpt |
| ResNet-RS-101 | 192x192 | 63.7 | 81.3 | 95.6 | config | ckpt |
| ResNet-RS-152 | 192x192 | 86.8 | 81.9 | 95.8 | config | ckpt |
| ResNet-RS-152 | 224x224 | 86.8 | 82.5 | 96.1 | config | ckpt |
| ResNet-RS-152 | 256x256 | 86.8 | 83.1 | 96.3 | config | ckpt |
| ResNet-RS-200 | 256x256 | 93.4 | 83.5 | 96.6 | config | ckpt |
| ResNet-RS-270 | 256x256 | 130.1 | 83.6 | 96.6 | config | ckpt |
| ResNet-RS-350 | 256x256 | 164.3 | 83.7 | 96.7 | config | ckpt |
| ResNet-RS-350 | 320x320 | 164.3 | 84.2 | 96.9 | config | ckpt |
We support ViT and DEIT implementations. ViT models trained under the DEIT settings:
| model | resolution | Top-1 | Top-5 | Download |
|---|---|---|---|---|
| ViT-ti16 | 224x224 | 73.4 | 91.9 | ckpt |
| ViT-s16 | 224x224 | 79.4 | 94.7 | ckpt |
| ViT-b16 | 224x224 | 81.8 | 95.8 | ckpt |
| ViT-l16 | 224x224 | 82.2 | 95.8 | ckpt |
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download |
|---|---|---|---|---|---|---|
| R50-FPN | 640x640 | 12 | 97.0 | 34.0 | 34.3 | config |
| R50-FPN | 640x640 | 72 | 97.0 | 34.0 | 36.8 | config | ckpt |
training features including:
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download |
|---|---|---|---|---|---|---|
| SpineNet-49 | 640x640 | 500 | 85.4 | 28.5 | 44.2 | config | ckpt |
| SpineNet-96 | 1024x1024 | 500 | 265.4 | 43.0 | 48.5 | config | ckpt |
| SpineNet-143 | 1280x1280 | 500 | 524.0 | 67.0 | 50.0 | config | ckpt |
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download |
|---|---|---|---|---|---|---|
| MobileNetv2 | 256x256 | 600 | - | 2.27 | 23.5 | config |
| Mobile SpineNet-49 | 384x384 | 600 | 1.0 | 2.32 | 28.1 | config | ckpt |
| Variant | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download |
|---|---|---|---|---|---|---|
| YOLOv7 | 640x640 | 300 | 53.16 | 44.57 | 50.5 | config | ckpt |
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Mask AP | Download |
|---|---|---|---|---|---|---|---|
| ResNet50-FPN | 640x640 | 350 | 227.7 | 46.3 | 42.3 | 37.6 | config |
| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | config |
| SpineNet-96 | 1024x1024 | 500 | 315.0 | 55.2 | 48.1 | 42.4 | config |
| SpineNet-143 | 1280x1280 | 500 | 498.8 | 79.2 | 49.3 | 43.4 | config |
| Backbone | Resolution | Epochs | Params (M) | Box AP | Mask AP | Download |
|---|---|---|---|---|---|---|
| SpineNet-49 | 640x640 | 500 | 56.4 | 46.4 | 40.0 | config |
| SpineNet-96 | 1024x1024 | 500 | 70.8 | 50.9 | 43.8 | config |
| SpineNet-143 | 1280x1280 | 500 | 94.9 | 51.9 | 45.0 | config |
| Model | Backbone | Resolution | Steps | mIoU | Download |
|---|---|---|---|---|---|
| DeepLabV3 | Dilated Resnet-101 | 512x512 | 30k | 78.7 | |
| DeepLabV3+ | Dilated Resnet-101 | 512x512 | 30k | 79.2 | ckpt |
| Model | Backbone | Resolution | Steps | mIoU | Download |
|---|---|---|---|---|---|
| DeepLabV3+ | Dilated Resnet-101 | 1024x2048 | 90k | 78.79 |
We provide models for video classification with backbones:
Training and evaluation details (SlowFast and ResNet):
| Model | Input (frame x stride) | Top-1 | Top-5 | Download |
|---|---|---|---|---|
| SlowOnly | 8 x 8 | 74.1 | 91.4 | config |
| SlowOnly | 16 x 4 | 75.6 | 92.1 | config |
| R3D-50 | 32 x 2 | 77.0 | 93.0 | config |
| R3D-RS-50 | 32 x 2 | 78.2 | 93.7 | config |
| R3D-RS-101 | 32 x 2 | 79.5 | 94.2 | - |
| R3D-RS-152 | 32 x 2 | 79.9 | 94.3 | - |
| R3D-RS-200 | 32 x 2 | 80.4 | 94.4 | - |
| R3D-RS-200 | 48 x 2 | 81.0 | - | - |
| MoViNet-A0-Base | 50 x 5 | 69.40 | 89.18 | - |
| MoViNet-A1-Base | 50 x 5 | 74.57 | 92.03 | - |
| MoViNet-A2-Base | 50 x 5 | 75.91 | 92.63 | - |
| MoViNet-A3-Base | 120 x 2 | 79.34 | 94.52 | - |
| MoViNet-A4-Base | 80 x 3 | 80.64 | 94.93 | - |
| MoViNet-A5-Base | 120 x 2 | 81.39 | 95.06 | - |
| Model | Input (frame x stride) | Top-1 | Top-5 | Download |
|---|---|---|---|---|
| SlowOnly | 8 x 8 | 77.3 | 93.6 | config |
| R3D-50 | 32 x 2 | 79.5 | 94.8 | config |
| R3D-RS-200 | 32 x 2 | 83.1 | - | - |
| R3D-RS-200 | 48 x 2 | 83.8 | - | - |
| MoViNet-A0-Base | 50 x 5 | 72.05 | 90.92 | config |
| MoViNet-A1-Base | 50 x 5 | 76.69 | 93.40 | config |
| MoViNet-A2-Base | 50 x 5 | 78.62 | 94.17 | config |
| MoViNet-A3-Base | 120 x 2 | 81.79 | 95.67 | config |
| MoViNet-A4-Base | 80 x 3 | 83.48 | 96.16 | config |
| MoViNet-A5-Base | 120 x 2 | 84.27 | 96.39 | config |
Please read through the references in the examples/starter.