docs/source/en/accelerate.md
Accelerate provides a unified interface for distributed training backends like FSDP or DeepSpeed. It detects your environment (number of GPUs, distributed backend, mixed precision, etc.) and automatically configures training, whether you're on 1 GPU with DDP or 8 GPUs with FSDP.
Accelerate wraps the model in the appropriate distributed wrapper, moves it to the correct device, and creates a compatible optimizer. During training, Accelerate uses its own [~accelerate.Accelerator.backward] method to handle gradient scaling for mixed precision. [Trainer] calls the appropriate Accelerate APIs and delegates all distributed mechanics to Accelerate.
Configure Accelerate for [Trainer] with either an Accelerate config file or [TrainingArguments].
Run the accelerate config command and answer questions about your hardware and training setup. This creates a default_config.yaml file in your cache. The example below is for FSDP.
compute_environment: LOCAL_MACHINE
distributed_type: FSDP
fsdp_config:
fsdp_version: 2
fsdp_reshard_after_forward: true
fsdp_cpu_offload: false
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_cpu_ram_efficient_loading: true
fsdp_activation_checkpointing: false
fsdp_state_dict_type: SHARDED_STATE_DICT
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
mixed_precision: bf16
num_machines: 1
num_processes: 4
Run accelerate launch with a [Trainer]-based script, and Accelerate reads the config file to set up training. The fsdp_config and deepspeed args are unnecessary because the Accelerate config file covers the same settings.
accelerate launch train.py
The accelerator_config accepts settings that don't have dedicated top-level arguments. For example, set non_blocking=True together with [~TrainingArguments.dataloader_pin_memory] to overlap data transfer with compute for higher GPU throughput.
from transformers import TrainingArguments
TrainingArguments(
...,
dataloader_pin_memory=True,
accelerator_config={
"non_blocking": True,
},
)
Pass a backend-specific config to [TrainingArguments]. The [~Trainer.create_accelerator_and_postprocess] method reads the settings and configures training.
Pass a JSON config file or dict to [~TrainingArguments.fsdp_config]. See FSDP for a full guide and config reference.
from transformers import TrainingArguments
TrainingArguments(
...,
fsdp=True,
fsdp_config="path/to/fsdp.json",
)
Pass a JSON config file or dict to [~TrainingArguments.deepspeed]. See DeepSpeed for a full guide and config reference.
from transformers import TrainingArguments
TrainingArguments(
...,
deepspeed="path/to/ds_config.json",
)
DDP is configured directly through [TrainingArguments] fields. See DDP for details.
from transformers import TrainingArguments
TrainingArguments(
...,
ddp_backend="nccl",
ddp_find_unused_parameters=False,
ddp_bucket_cap_mb=25,
ddp_timeout=1800,
)