Back to Megatron Lm

Custom Pipeline Model Parallel Layout

docs/user-guide/features/pipeline_parallel_layout.md

23.061.6 KB
Original Source
<!--- Copyright (c) 2022-2026, NVIDIA CORPORATION. All rights reserved. NVIDIA CORPORATION and its licensors retain all intellectual property and proprietary rights in and to this software, related documentation and any modifications thereto. Any use, reproduction, disclosure or distribution of this software and related documentation without an express license agreement from NVIDIA CORPORATION is strictly prohibited. -->

Custom Pipeline Model Parallel Layout

This is an experimental feature and may be changed.

--pipeline-model-parallel-layout is a flexible API for defining the pipeline parallel partitioning, which is essential for balanced partitioning for an imbalanced model. For example, to partition DeepSeek-V3 (61 decoder layers + 1 mtp layer) with PP16VPP2, we can include the arguments as follows:

bash
--pipeline-model-parallel-size 16
--pipeline-model-parallel-layout "Et*3|(tt|)*29,m|L"
PP \ VPP rank01
0embedding + 3 × decoder2 × decoder
1~132 × decoder2 × decoder
142 × decodermtp
152 × decoderloss

In the layout string, stages are split by '|'. Replicated stages or layers can be described with multiplication. Commas can be used cosmetically. Symbol choices:

  • E = embedding layer
  • t = transformer decoder layer
  • m = MTP layer
  • L = loss calculation layer

Note that it is legal to have empty stages, e.g., E||t|L (the second stage is empty).