docs/get-started/install.md
Install uv, a fast Python package installer:
curl -LsSf https://astral.sh/uv/install.sh | sh
Install the latest stable release from PyPI:
uv pip install megatron-core
To include optional training dependencies (Weights & Biases, SentencePiece, HF Transformers):
uv pip install "megatron-core[training]"
For all extras including Transformer Engine:
uv pip install --group build
uv pip install --no-build-isolation "megatron-core[training,dev]"
`--no-build-isolation` requires build dependencies to be pre-installed in the environment. `torch` is needed because several `[dev]` packages (`mamba-ssm`, `nv-grouped-gemm`, `transformer-engine`) import it at build time to compile CUDA kernels. Expect this step to take **20+ minutes** depending on your hardware. If you prefer pre-built binaries, the [NGC Container](#option-c-ngc-container) ships with these pre-compiled.
Building from source can consume a large amount of memory. By default the build runs one compiler job per CPU core, which may cause out-of-memory failures on machines with many cores. To limit parallel compilation jobs, set the `MAX_JOBS` environment variable before installing (e.g. `MAX_JOBS=4`).
For a lighter set of development dependencies without Transformer Engine and ModelOpt, use `[lts]` instead of `[dev]`: `uv pip install --no-build-isolation "megatron-core[training,lts]"`. The `[lts]` and `[dev]` extras are mutually exclusive.
To clone the repository for examples:
git clone https://github.com/NVIDIA/Megatron-LM.git
For development or to run the latest unreleased code:
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
uv pip install -e .
To install with all development dependencies (includes Transformer Engine, requires pre-installed build deps):
uv pip install --group build
uv pip install --no-build-isolation -e ".[training,dev]"
If the build runs out of memory, limit parallel compilation jobs with `MAX_JOBS=4 uv pip install --no-build-isolation -e ".[training,dev]"`.
For a pre-configured environment with all dependencies pre-installed (PyTorch, CUDA, cuDNN, NCCL, Transformer Engine), use the PyTorch NGC Container.
We recommend using the previous month's NGC container rather than the latest one to ensure compatibility with the current Megatron Core release and testing matrix.
docker run --gpus all -it --rm \
-v /path/to/dataset:/workspace/dataset \
-v /path/to/checkpoints:/workspace/checkpoints \
-e PIP_CONSTRAINT= \
nvcr.io/nvidia/pytorch:26.01-py3
The NGC PyTorch container constrains the Python environment globally via `PIP_CONSTRAINT`. The `-e PIP_CONSTRAINT=` flag above unsets this so that Megatron Core and its dependencies install correctly.
Then install Megatron Core inside the container (torch is already available in the NGC image):
pip install uv
uv pip install --no-build-isolation "megatron-core[training,dev]"
You are now ready to run training. See Your First Training Run for next steps.