docs_new/docs/sglang-diffusion/attention_backends.mdx
This document describes the attention backends available in sglang diffusion (sglang.multimodal_gen) and how to select them.
Attention backends are defined by AttentionBackendEnum (sglang.multimodal_gen.runtime.platforms.interface.AttentionBackendEnum) and selected via the CLI flag --attention-backend.
Backend selection is performed by the shared attention layers (e.g. LocalAttention / USPAttention / UlyssesAttention in sglang.multimodal_gen.runtime.layers.attention.layer) and therefore applies to any model component using these layers (e.g. diffusion transformer / DiT and encoders).
When using the diffusers backend, --attention-backend is passed through to diffusers'
set_attention_backend (e.g., flash, _flash_3_hub, sage, xformers, native).
For SGLang-native pipelines, the CLI accepts the lowercase names of AttentionBackendEnum. The table below lists the backends implemented by the built-in platforms. fa3/fa4 are accepted as aliases for fa.
The selection order in runtime/layers/attention/selector.py is:
global_force_attn_backend(...) / global_force_attn_backend_context_manager(...)--attention-backend (ServerArgs.attention_backend)Some backends require additional configuration. You can pass these parameters via --attention-backend-config. This argument accepts:
'{"sparsity": 0.5}')."sparsity=0.5,enable_x=true").Sliding Tile Attention (sliding_tile_attn)
Video Sparse Attention (video_sparse_attn)
V-MoBA (vmoba_attn)
sglang generate \
--model-path <MODEL_PATH_OR_ID> \
--prompt "..." \
--attention-backend fa
sglang generate \
--model-path <MODEL_PATH_OR_ID> \
--prompt "..." \
--attention-backend torch_sdpa
# Pass the mask strategy file path via config
sglang generate \
--model-path <MODEL_PATH_OR_ID> \
--prompt "..." \
--attention-backend sliding_tile_attn \
--attention-backend-config "mask_strategy_file_path=/abs/path/to/mask_strategy.json"
--attention-backend torch_sdpa or fa depending on what is available in your environment.torch_sdpa.