Back to Transformers

torchtitan

docs/source/en/community_integrations/torchtitan.md

5.8.02.6 KB
Original Source
<!--Copyright 2026 The HuggingFace Team. All rights reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer. -->

torchtitan

torchtitan is PyTorch's distributed training framework for large language models. It supports Fully Sharded Data Parallelism (FSDP), tensor, pipeline, and context parallelism (4D parallelism). torchtitan is fully compatible with torch.compile, enabling kernel fusion and graph optimizations that significantly reduce memory overhead and speed up training.

[!NOTE] Only dense models are supported at the moment.

Use a Transformers model directly in torchtitan's distributed training infrastructure.

py
import torch
from torchtitan.config.job_config import JobConfig
from torchtitan.experiments.transformers_modeling_backend.job_config import (
    HFTransformers,
)
from torchtitan.experiments.transformers_modeling_backend.model.args import (
    TitanDenseModelArgs,
    HFTransformerModelArgs,
)
from torchtitan.experiments.transformers_modeling_backend.model.model import (
    HFTransformerModel,
)

job_config = JobConfig()

job_config.hf_transformers = HFTransformers(model="Qwen/Qwen2.5-7B")

titan_args = TitanDenseModelArgs()
model_args = HFTransformerModelArgs(titan_dense_args=titan_args).update_from_config(
    job_config
)

model = HFTransformerModel(model_args)

Transformers integration

  1. [AutoConfig.from_pretrained] loads the config for a given model. The config values are copied into torchtitan style args in HFTransformerModelArgs.
  2. torchtitan's HFTransformerModel wrapper scans the architecture field in the config and instantiates and loads the corresponding model class, like [LlamaForCausalLM].
  3. The forward path uses native Transformers components while leaning on torchtitan's parallelization and optimization methods. torchtitan treats the Transformers model as a torchtitan model without needing to rewrite anything.

Resources