Accelerate

Accelerate provides a unified interface for distributed training backends like FSDP or DeepSpeed. It detects your environment (number of GPUs, distributed backend, mixed precision, etc.) and automatically configures training, whether you're on 1 GPU with DDP or 8 GPUs with FSDP.

Accelerate wraps the model in the appropriate distributed wrapper, moves it to the correct device, and creates a compatible optimizer. During training, Accelerate uses its own [~accelerate.Accelerator.backward] method to handle gradient scaling for mixed precision. [Trainer] calls the appropriate Accelerate APIs and delegates all distributed mechanics to Accelerate.

Configure Accelerate for [Trainer] with either an Accelerate config file or [TrainingArguments].

Accelerate config file

Run the accelerate config command and answer questions about your hardware and training setup. This creates a default_config.yaml file in your cache. The example below is for FSDP.

yaml

compute_environment: LOCAL_MACHINE
distributed_type: FSDP
fsdp_config:
  fsdp_version: 2
  fsdp_reshard_after_forward: true
  fsdp_cpu_offload: false
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_cpu_ram_efficient_loading: true
  fsdp_activation_checkpointing: false
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
mixed_precision: bf16
num_machines: 1
num_processes: 4

Run accelerate launch with a [Trainer]-based script, and Accelerate reads the config file to set up training. The fsdp_config and deepspeed args are unnecessary because the Accelerate config file covers the same settings.

cli

accelerate launch train.py

The accelerator_config accepts settings that don't have dedicated top-level arguments. For example, set non_blocking=True together with [~TrainingArguments.dataloader_pin_memory] to overlap data transfer with compute for higher GPU throughput.

from transformers import TrainingArguments

TrainingArguments(
    ...,
    dataloader_pin_memory=True,
    accelerator_config={
        "non_blocking": True,
    },
)

TrainingArguments

Pass a backend-specific config to [TrainingArguments]. The [~Trainer.create_accelerator_and_postprocess] method reads the settings and configures training.

Pass a JSON config file or dict to [~TrainingArguments.fsdp_config]. See FSDP for a full guide and config reference.

from transformers import TrainingArguments

TrainingArguments(
    ...,
    fsdp=True,
    fsdp_config="path/to/fsdp.json",
)

</hfoption> <hfoption id="DeepSpeed">

Pass a JSON config file or dict to [~TrainingArguments.deepspeed]. See DeepSpeed for a full guide and config reference.

from transformers import TrainingArguments

TrainingArguments(
    ...,
    deepspeed="path/to/ds_config.json",
)

</hfoption> <hfoption id="DDP">

DDP is configured directly through [TrainingArguments] fields. See DDP for details.

from transformers import TrainingArguments

TrainingArguments(
    ...,
    ddp_backend="nccl",
    ddp_find_unused_parameters=False,
    ddp_bucket_cap_mb=25,
    ddp_timeout=1800,
)

</hfoption> </hfoptions>

Next steps

See DDP for data-parallel training when your model fits on one GPU.
See FSDP for sharding parameters, gradients, and optimizer states across GPUs.
See DeepSpeed for ZeRO optimization and offloading.