Back to Deepspeed

ZeRO stage 1 with reduced communication

docs/_posts/2020-03-17-reduce-scatter.md

0.18.9272 B
Original Source
  • Partition-aware approach instead of initial implementation that used a global collective (all-reduce)
  • Total communication volume reduction 1.5x -> 1x of data parallelism
  • Up to 2x reduction in communication time compared to all-reduce

Further updates coming soon!