Examples/Image/Classification/ResNet/Python/README.md
CIFAR-10 datasets are not included in the CNTK distribution but can be easily downloaded and converted by following the instructions in DataSets/CIFAR-10. We recommend you to keep the downloaded data in the respective folder while downloading, as the scripts in this folder assume that by default.
This example code applies ResNet on the CIFAR-10 dataset. We strictly follow the ResNet paper for the network architecture. That is, the network has a first layer of 3x3 convolutions, followed by 6n layers with 3x3 convolution on the feature maps of size {32, 16, 8} respectively, with 2n layers for each feature map size. For ResNet20, we have n=3, for ResNet110, we have n=18. The network ends with a global average pooling, a 10-way fully-connected layer, and softmax. Batch normalization is applied everywhere except the last fully-connected layer.
We use a fixed crop ratio of 0.8 and scale the image to 32x32 pixels for training. Since all training images are pre-padded to 40x40 pixels, effectively we only perform translation transform without scaling. Run the example from the current folder using:
python TrainResNet_CIFAR10.py -n resnet20
python TrainResNet_CIFAR10.py -n resnet110
for ResNet20 and ResNet110, respectively. The ResNet20 network achieves an error rate of about 8.23%, and the ResNet110 network achieves an error rate of about 6.24%.
This example is similar to TrainResNet_CIFAR10.py, but it adds support for distributed training via MPI. Details can be found here. Note this example requires a multi-GPU machine or mpi hosts file to distribute to multiple machines.
Simple aggregation, ResNet20, with a 2-GPU machine:
mpiexec -n 2 python TrainResNet_CIFAR10_Distributed.py -n resnet20 -q 32
Quantized 1-bit aggregation with 50000 samples before distributed, ResNet20, with a 2-GPU machine:
mpiexec -n 2 python TrainResNet_CIFAR10_Distributed.py -n resnet20 -q 1 -a 50000
To run with maximum parallelization with minibatch size scaled according to #workers for 3 epochs:
mpiexec -n 2 python TrainResNet_CIFAR10_Distributed.py -s True -e 3
This example is python implementation of ResNet-V2 model, which is originally described in Deep Residual Learning for Image Recognition.
This script supports distributed training feature. To run it in a distributed manner, please check here. For example, the command for distributed training on the same machine (with multiple GPUs) with Windows is:
mpiexec -n <#workers> python TrainResNet_ImageNet_Distributed.py
Simple aggregation, ResNet50, with a 8-GPU machine:
mpiexec -n 8 python TrainResNet_ImageNet_Distributed.py -n resnet50
In our experiments, we achieves 23.65% top-1 error on ResNet50, 21.61% top-1 error on ResNet101 and 20.93% top-1 error on ResNet152.
For more parameter definitions, please use -h command to see the help text:
python TrainResNet_ImageNet_Distributed.py -h