Installation

System Requirements

Hardware

Recommended: NVIDIA Turing architecture or later
FP8 Support: Requires NVIDIA Hopper, Ada, or Blackwell GPUs

Software

Python: >= 3.10 (3.12 recommended)
PyTorch: >= 2.6.0
CUDA Toolkit: Latest stable version

Prerequisites

Install uv, a fast Python package installer:

bash

curl -LsSf https://astral.sh/uv/install.sh | sh

Option A: Pip Install (Recommended)

Install the latest stable release from PyPI:

bash

uv pip install megatron-core

To include optional training dependencies (Weights & Biases, SentencePiece, HF Transformers):

bash

uv pip install "megatron-core[training]"

For all extras including Transformer Engine:

bash

uv pip install --group build
uv pip install --no-build-isolation "megatron-core[training,dev]"

{note}

`--no-build-isolation` requires build dependencies to be pre-installed in the environment. `torch` is needed because several `[dev]` packages (`mamba-ssm`, `nv-grouped-gemm`, `transformer-engine`) import it at build time to compile CUDA kernels. Expect this step to take **20+ minutes** depending on your hardware. If you prefer pre-built binaries, the [NGC Container](#option-c-ngc-container) ships with these pre-compiled.

{warning}

Building from source can consume a large amount of memory. By default the build runs one compiler job per CPU core, which may cause out-of-memory failures on machines with many cores. To limit parallel compilation jobs, set the `MAX_JOBS` environment variable before installing (e.g. `MAX_JOBS=4`).

{tip}

For a lighter set of development dependencies without Transformer Engine and ModelOpt, use `[lts]` instead of `[dev]`: `uv pip install --no-build-isolation "megatron-core[training,lts]"`. The `[lts]` and `[dev]` extras are mutually exclusive.

To clone the repository for examples:

bash

git clone https://github.com/NVIDIA/Megatron-LM.git

Option B: Install from Source

For development or to run the latest unreleased code:

bash

git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
uv pip install -e .

To install with all development dependencies (includes Transformer Engine, requires pre-installed build deps):

bash

uv pip install --group build
uv pip install --no-build-isolation -e ".[training,dev]"

{tip}

If the build runs out of memory, limit parallel compilation jobs with `MAX_JOBS=4 uv pip install --no-build-isolation -e ".[training,dev]"`.

Option C: NGC Container

For a pre-configured environment with all dependencies pre-installed (PyTorch, CUDA, cuDNN, NCCL, Transformer Engine), use the PyTorch NGC Container.

We recommend using the previous month's NGC container rather than the latest one to ensure compatibility with the current Megatron Core release and testing matrix.

bash

docker run --gpus all -it --rm \
  -v /path/to/dataset:/workspace/dataset \
  -v /path/to/checkpoints:/workspace/checkpoints \
  -e PIP_CONSTRAINT= \
  nvcr.io/nvidia/pytorch:26.01-py3

{note}

The NGC PyTorch container constrains the Python environment globally via `PIP_CONSTRAINT`. The `-e PIP_CONSTRAINT=` flag above unsets this so that Megatron Core and its dependencies install correctly.

Then install Megatron Core inside the container (torch is already available in the NGC image):

bash

pip install uv
uv pip install --no-build-isolation "megatron-core[training,dev]"

You are now ready to run training. See Your First Training Run for next steps.