integration/tools/benchmark/pytorch/README.md
This module includes the testing scripts for benchmarking Pytorch data loading performance of various file system implementations including Alluxio POSIX API.
/mnt/alluxio-fuse/ in this benchmarking nodeluqqiu/alluxioloadagent:latest which is built based on the Dockerfile included in this moduledocker run -it --rm --name loadtest -e NVIDIA_VISIBLE_DEVICES= -v `pwd`:/v/ -v /mnt:/mnt:rshared -w /v luqqiu/alluxioloadagent:latest bash
inputdata.csv, one filepath per file WITHOUT the common alluxio path prefix.
The common path prefix will be passed to load.py../run-test.sh 2 load.py --workers 2 --file_name_list inputdata.csv --number_of_files 10000 \
-p /mnt/alluxio-fuse/data/
/mnt/alluxio-fuse/ in each benchmarking nodeinputdata.csvexport MASTER_ADDR=${NODE_ONE_HOSTNAME} \
&& export MASTER_PORT=${NODE_ONE_PORT} \
&& export WORLD_SIZE=2 \
&& export RANK=0 \
&& run-test.sh 2 load.py --workers 2 --file_name_list inputdata.csv --number_of_files 10000 \
-p /mnt/alluxio-fuse/data/"
Change the RANK=0 to RANK=1 and run in the other node.
Arena can be used for running the benchmark in multi-node.
arena --loglevel info submit pytorch --name=test-job --gpus=0 --workers=2 --cpu 4 --memory 32G \
--image=luqqiu/alluxioloadagent:latest --selector alluxio-master=false --data-dir=/mnt/ \
--sync-mode=git --sync-source=https://github.com/Alluxio/alluxio.git \
"export MASTER_ADDR=test-job-master-0 && export MASTER_PORT=12425 \
&& /root/code/alluxio/integration/tools/benchmark/pytorch/run-test.sh 2 /root/code/alluxio/integration/tools/benchmark/pytorch/load.py \
--workers 2 --file_name_list inputdata.csv --number_of_files 10000 \
-p /mnt/alluxio-fuse/data"
Please refer to Distribtued Pytorch Training Guide for more information about how to launch a pytorch script in multi-node.
Get the benchmarking pods name
kubectl get pods
The benchmark result of each node is shown in the logs of each kubernetes pod
kubectl logs test-job-master-0
kubectl logs test-job-worker-0
Special thanks to Kevin Cai and Zifan Ni for contributing this benchmark scripts.