Megatron Core

<h4>Production-ready library for building custom training frameworks</h4> <div align="left">

⚡ Quick Start

bash

# Install Megatron Core
uv pip install megatron-core

# Distributed training example (2 GPUs, mock data)
torchrun --nproc_per_node=2 examples/run_simple_mcore_train_loop.py

What is Megatron Core?

Megatron Core is an open-source PyTorch-based library that contains GPU-optimized techniques and cutting-edge system-level optimizations. It abstracts them into composable and modular APIs, allowing full flexibility for developers and model researchers to train custom transformers at-scale on NVIDIA accelerated computing infrastructure.

🚀 Key Components

GPU-Optimized Building Blocks

Transformer Components: Attention mechanisms, MLP layers, embeddings
Memory Management: Activation recomputation
FP8 Precision: Optimized for NVIDIA Hopper, Ada, and Blackwell GPUs

Parallelism Strategies

Tensor Parallelism (TP): Layer-wise parallelization (activation memory footprint can be further reduced using sequence parallelism)
Pipeline Parallelism (PP): Depth-wise model splitting and pipelining of microbatches to improve efficiency
Context Parallelism (CP): Long sequence handling (documentation)
Expert Parallelism (EP): Split experts of an MoE model across multiple GPUs

🔗 Examples & Documentation

Examples:

Simple Training Loop - Basic usage
Multimodal Training - Vision-language models
Mixture-of-Experts - MoE examples
Mamba Models - State-space models

Documentation:

📚 API Guide - Complete API documentation
💡 Developer Guide - Custom framework development

For complete installation instructions, performance benchmarks, and ecosystem information, see the main README.