projects/GN/README.md
This file provides Mask R-CNN baseline results and models trained with Group Normalization:
@article{GroupNorm2018,
title={Group Normalization},
author={Yuxin Wu and Kaiming He},
journal={arXiv:1803.08494},
year={2018}
}
Note: This code uses the GroupNorm op implemented in CUDA, included in the Caffe2 repo. When writing this document, Caffe2 is being merged into PyTorch, and the GroupNorm op is located here. Make sure your Caffe2 is up to date.
These models are trained in Caffe2 on the standard ImageNet-1k dataset, using GroupNorm with 32 groups (G=32).
Notes:
Notes:
Notes:
GN enables to train Mask R-CNN from scratch without ImageNet pre-training, despite the small batch size.
<table><tbody> <!-- START E2E MASK RCNN GN SCRATCH TABLE --> <!-- TABLE HEADER --> <!-- Info: we use wrap text in <sup><sub></sub><sup> to make is small --> <th valign="bottom"><sup><sub> case </sub></sup></th> <th valign="bottom"><sup><sub>type</sub></sup></th> <th valign="bottom"><sup><sub>lr schd</sub></sup></th> <th valign="bottom"><sup><sub>im/ gpu</sub></sup></th> <th valign="bottom"><sup><sub>train mem (GB)</sub></sup></th> <th valign="bottom"><sup><sub>train time (s/iter)</sub></sup></th> <th valign="bottom"><sup><sub>train time total (hr)</sub></sup></th> <th valign="bottom"><sup><sub>inference time (s/im)</sub></sup></th> <th valign="bottom"><sup><sub>box AP</sub></sup></th> <th valign="bottom"><sup><sub>mask AP</sub></sup></th> <th valign="bottom"><sup><sub>model id</sub></sup></th> <!-- TABLE BODY --> <tr> <td align="left"><sup><sub>R-50-FPN, GN, scratch</sub></sup></td> <td align="left"><sup><sub>Mask R-CNN</sub></sup></td> <td align="left"><sup><sub>3x</sub></sup></td> <td align="right"><sup><sub>2</sub></sup></td> <td align="right"><sup><sub>10.8</sub></sup></td> <td align="right"><sup><sub>1.087</sub></sup></td> <td align="right"><sup><sub>81.5</sub></sup></td> <td align="right"><sup><sub>0.140 + 0.019</sub></sup></td> <td align="right"><sup><sub>39.5</sub></sup></td> <td align="right"><sup><sub>35.2</sub></sup></td> <td align="right"><sup><sub>56421872</sub></sup></td> </tr> <tr> <td align="left"><sup><sub>R-101-FPN, GN, scratch</sub></sup></td> <td align="left"><sup><sub>Mask R-CNN</sub></sup></td> <td align="left"><sup><sub>3x</sub></sup></td> <td align="right"><sup><sub>2</sub></sup></td> <td align="right"><sup><sub>12.7</sub></sup></td> <td align="right"><sup><sub>1.243</sub></sup></td> <td align="right"><sup><sub>93.2</sub></sup></td> <td align="right"><sup><sub>0.177 + 0.019</sub></sup></td> <td align="right"><sup><sub>41.0</sub></sup></td> <td align="right"><sup><sub>36.4</sub></sup></td> <td align="right"><sup><sub>56421911</sub></sup></td> </tr> <!-- END E2E MASK RCNN GN SCRATCH TABLE --> </tbody></table>Notes:
scratch .freeze_at=0. See this commit about the related issue.<table><tbody> <!-- START E2E MASK RCNN GN SCRATCH TABLE --> <!-- TABLE HEADER --> <!-- Info: we use wrap text in <sup><sub></sub><sup> to make is small --> <!-- TABLE BODY --> <tr> <td align="left"><sup><sub><s>R-50-FPN, GN, scratch</s></sub></sup></td> <td align="left"><sup><sub><s>Mask R-CNN</s></sub></sup></td> <td align="left"><sup><sub><s>3x</s></sub></sup></td> <td align="right"><sup><sub><s>2</s></sub></sup></td> <td align="right"><sup><sub><s>10.5</s></sub></sup></td> <td align="right"><sup><sub><s>0.990</s></sub></sup></td> <td align="right"><sup><sub><s>74.3</s></sub></sup></td> <td align="right"><sup><sub><s>0.146 + 0.020</s></sub></sup></td> <td align="right"><sup><sub><s>36.2</s></sub></sup></td> <td align="right"><sup><sub><s>32.5</s></sub></sup></td> <td align="right"><sup><sub><s>49025460</s></sub></sup></td> </tr> <tr> <td align="left"><sup><sub><s>R-101-FPN, GN, scratch</s></sub></sup></td> <td align="left"><sup><sub><s>Mask R-CNN</s></sub></sup></td> <td align="left"><sup><sub><s>3x</s></sub></sup></td> <td align="right"><sup><sub><s>2</s></sub></sup></td> <td align="right"><sup><sub><s>12.4</s></sub></sup></td> <td align="right"><sup><sub><s>1.124</s></sub></sup></td> <td align="right"><sup><sub><s>84.3</s></sub></sup></td> <td align="right"><sup><sub><s>0.180 + 0.019</s></sub></sup></td> <td align="right"><sup><sub><s>37.5</s></sub></sup></td> <td align="right"><sup><sub><s>33.3</s></sub></sup></td> <td align="right"><sup><sub><s>49024951</s></sub></sup></td> </tr> <!-- END E2E MASK RCNN GN SCRATCH TABLE --> </tbody></table>
Notes:
freeze_at=2. This means the layers of conv1 and res2 were simply random weights in the case of training from-scratch. See this commit about the related issue.