examples/tutorial/new_api/cifar_vit/README.md
This example provides a training script, which provides an example of training ViT on CIFAR10 dataset from scratch.
-p, --plugin: Plugin to use. Choices: torch_ddp, torch_ddp_fp16, low_level_zero. Defaults to torch_ddp.-r, --resume: Resume from checkpoint file path. Defaults to -1, which means not resuming.-c, --checkpoint: The folder to save checkpoints. Defaults to ./checkpoint.-i, --interval: Epoch interval to save checkpoints. Defaults to 5. If set to 0, no checkpoint will be saved.--target_acc: Target accuracy. Raise exception if not reached. Defaults to None.pip install -r requirements.txt
# train with torch DDP with fp32
colossalai run --nproc_per_node 4 train.py -c ./ckpt-fp32
# train with torch DDP with mixed precision training
colossalai run --nproc_per_node 4 train.py -c ./ckpt-fp16 -p torch_ddp_fp16
# train with low level zero
colossalai run --nproc_per_node 4 train.py -c ./ckpt-low_level_zero -p low_level_zero
Expected accuracy performance will be:
| Model | Single-GPU Baseline FP32 | Booster DDP with FP32 | Booster DDP with FP16 | Booster Low Level Zero |
|---|---|---|---|---|
| ViT | 83.00% | 84.03% | 84.00% | 84.43% |