docs/source/en/community_integrations/nanotron.md
Nanotron is a distributed training framework with tensor, parallel, and data parallelism (3D parallelism). It is designed for large-scale training workloads across hundreds of GPUs.
Convert any Transformers model to an optimized Nanotron transformer model implementation for pretraining with the convert_hf_to_nanotron.py script.
torchrun --nproc_per_node=1 examples/llama/convert_hf_to_nanotron.py \
--checkpoint_path=meta-llama/Llama-2-7b-hf \
--save_path=./llama-7b-nanotron
Llama], with the [~LlamaForCausalLM.from_pretrained] function. This reads the config.json file from the checkpoint directory and creates a [LlamaConfig].LlamaConfig] to it's own config format and creates a Nanotron model.Nanotron also relies on [AutoTokenizer] for turning text into token ids during preprocessing and generation.