docs/diffusion/compatibility_matrix.md
The table below shows every supported model and the optimizations supported for them.
The symbols used have the following meanings:
The HuggingFace Model ID can be passed directly to from_pretrained() methods, and sglang-diffusion will use the
optimal
default parameters when initializing and generating videos.
| Model Name | Hugging Face Model ID | Resolutions | TeaCache | Sliding Tile Attn | Sage Attn | Video Sparse Attention (VSA) | Sparse Linear Attention (SLA) | Sage Sparse Linear Attention (SageSLA) | Sparse Video Gen 2 (SVG2) |
|---|---|---|---|---|---|---|---|---|---|
| FastWan2.1 T2V 1.3B | FastVideo/FastWan2.1-T2V-1.3B-Diffusers | 480p | ⭕ | ⭕ | ⭕ | ✅ | ❌ | ❌ | ❌ |
| FastWan2.2 TI2V 5B Full Attn | FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers | 720p | ⭕ | ⭕ | ⭕ | ✅ | ❌ | ❌ | ❌ |
| Wan2.2 TI2V 5B | Wan-AI/Wan2.2-TI2V-5B-Diffusers | 720p | ⭕ | ⭕ | ✅ | ⭕ | ❌ | ❌ | ❌ |
| Wan2.2 T2V A14B | Wan-AI/Wan2.2-T2V-A14B-Diffusers | 480p | |||||||
| 720p | ❌ | ❌ | ✅ | ⭕ | ❌ | ❌ | ❌ | ||
| Wan2.2 I2V A14B | Wan-AI/Wan2.2-I2V-A14B-Diffusers | 480p | |||||||
| 720p | ❌ | ❌ | ✅ | ⭕ | ❌ | ❌ | ❌ | ||
| HunyuanVideo | hunyuanvideo-community/HunyuanVideo | 720×1280 | |||||||
| 544×960 | ❌ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ | ||
| FastHunyuan | FastVideo/FastHunyuan-diffusers | 720×1280 | |||||||
| 544×960 | ❌ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ | ||
| Wan2.1 T2V 1.3B | Wan-AI/Wan2.1-T2V-1.3B-Diffusers | 480p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ |
| Wan2.1 T2V 14B | Wan-AI/Wan2.1-T2V-14B-Diffusers | 480p, 720p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ |
| Wan2.1 I2V 480P | Wan-AI/Wan2.1-I2V-14B-480P-Diffusers | 480p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ |
| Wan2.1 I2V 720P | Wan-AI/Wan2.1-I2V-14B-720P-Diffusers | 720p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ |
| TurboWan2.1 T2V 1.3B | IPostYellow/TurboWan2.1-T2V-1.3B-Diffusers | 480p | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⭕ |
| TurboWan2.1 T2V 14B | IPostYellow/TurboWan2.1-T2V-14B-Diffusers | 480p | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⭕ |
| TurboWan2.1 T2V 14B 720P | IPostYellow/TurboWan2.1-T2V-14B-720P-Diffusers | 720p | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⭕ |
| TurboWan2.2 I2V A14B | IPostYellow/TurboWan2.2-I2V-A14B-Diffusers | 720p | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⭕ |
| Wan2.1 Fun 1.3B InP | weizhou03/Wan2.1-Fun-1.3B-InP-Diffusers | 480p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ |
| Helios Base | BestWishYsh/Helios-Base | 720p | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Helios Mid | BestWishYsh/Helios-Mid | 720p | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Helios Distilled | BestWishYsh/Helios-Distilled | 720p | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| LTX-2 (one/two-stage/TI2V) | Lightricks/LTX-2 | 768×512 | |||||||
| 1536×1024 | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ||
| LTX-2.3 (one/two-stage/TI2V/HQ) | Lightricks/LTX-2.3 | 768×512 | |||||||
| 1536×1024 | |||||||||
| 1920×1088 (HQ default) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Note:
pip install git+https://github.com/thu-ml/SpargeAttn.git --no-build-isolation--pipeline-class-name LTX2Pipeline--pipeline-class-name LTX2TwoStagePipeline--pipeline-class-name LTX2TwoStageHQPipeline (HQ defaults to 1920×1088; you can still override --width/--height)--image-path) on one-stage and two-stage pipelines (including HQ).--spatial-upsampler-path and --distilled-lora-path.Resolutions column uses output video width×height semantics, matching sglang generate --width ... --height ....--ltx2-two-stage-device-mode {original,snapshot,resident}:
snapshot is the default and recommended mode.resident usually provides the best latency/throughput but uses much more VRAM.original keeps official two-stage semantics without the premerged stage-2 transformer path.original 154.67s, snapshot 114.05s, resident 75.71s; peak VRAM trend is original < snapshot < resident.| Model Name | HuggingFace Model ID |
|---|---|
| FLUX.1-dev | black-forest-labs/FLUX.1-dev |
| FLUX.2-dev | black-forest-labs/FLUX.2-dev |
| FLUX.2-dev-NVFP4 | black-forest-labs/FLUX.2-dev-NVFP4 |
| FLUX.2-Klein-4B | black-forest-labs/FLUX.2-klein-4B |
| FLUX.2-Klein-9B | black-forest-labs/FLUX.2-klein-9B |
| Z-Image | Tongyi-MAI/Z-Image |
| Z-Image-Turbo | Tongyi-MAI/Z-Image-Turbo |
| GLM-Image | zai-org/GLM-Image |
| Qwen Image | Qwen/Qwen-Image |
| Qwen Image 2512 | Qwen/Qwen-Image-2512 |
| Qwen Image Edit | Qwen/Qwen-Image-Edit |
| Qwen Image Edit 2509 | Qwen/Qwen-Image-Edit-2509 |
| Qwen Image Edit 2511 | Qwen/Qwen-Image-Edit-2511 |
| Qwen Image Layered | Qwen/Qwen-Image-Layered |
| SD3 Medium | stabilityai/stable-diffusion-3-medium-diffusers |
| SD3.5 Medium | stabilityai/stable-diffusion-3.5-medium-diffusers |
| SD3.5 Large | stabilityai/stable-diffusion-3.5-large-diffusers |
| Hunyuan3D-2 | tencent/Hunyuan3D-2 |
| SANA 1.5 1.6B | Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers |
| SANA 1.5 4.8B | Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers |
| SANA 1600M 1024px | Efficient-Large-Model/Sana_1600M_1024px_diffusers |
| SANA 600M 1024px | Efficient-Large-Model/Sana_600M_1024px_diffusers |
| SANA 1600M 512px | Efficient-Large-Model/Sana_1600M_512px_diffusers |
| SANA 600M 512px | Efficient-Large-Model/Sana_600M_512px_diffusers |
| FireRed-Image-Edit 1.0 | FireRedTeam/FireRed-Image-Edit-1.0 |
| FireRed-Image-Edit 1.1 | FireRedTeam/FireRed-Image-Edit-1.1 |
| ERNIE-Image | baidu/ERNIE-Image |
| ERNIE-Image-Turbo | baidu/ERNIE-Image-Turbo |
SGLang Diffusion supports overriding individual pipeline components with
--<component>-path. The value can be either a Hugging Face repo ID or a local
component directory.
The same overrides can also be provided in config files through
component_paths.<component>.
CLI:
sglang generate \
--model-path black-forest-labs/FLUX.2-dev \
--vae-path black-forest-labs/FLUX.2-small-decoder \
--transformer-path /models/flux2/transformer
Config file:
model_path: black-forest-labs/FLUX.2-dev
component_paths:
vae: black-forest-labs/FLUX.2-small-decoder
transformer: /models/flux2/transformer
Use the component name from the pipeline's model_index.json or the native pipeline's registered module name:
| Component Type | Supported Keys | Notes |
|---|---|---|
| VAE | vae, video_vae, audio_vae | vae is the common image-generation override |
| Transformer / DiT | transformer, video_dit, audio_dit | transformer is the standard override for the main denoiser |
| Text / Preprocess | text_encoder, text_encoder_2, tokenizer, processor, image_processor | Replacement encoders often need matching preprocessing assets |
| Auxiliary | scheduler, spatial_upsampler, vocoder, connectors, dual_tower_bridge, image_encoder, vision_language_encoder | Only valid for pipelines that expose these components |
The table below lists concrete Hugging Face component repos that are already used in SGLang Diffusion docs or tests. It is not an exhaustive catalog of all compatible component repos.
| Base Model | Override Key | Example Repo | Notes |
|---|---|---|---|
black-forest-labs/FLUX.2-dev | vae | black-forest-labs/FLUX.2-small-decoder | Decoder-only FLUX.2 VAE override |
black-forest-labs/FLUX.2-dev | vae | fal/FLUX.2-Tiny-AutoEncoder | Existing tested custom VAE path |
--vae-path is the common image-generation override.--video-vae-path and --audio-vae-path are only relevant for pipelines with separate video or audio VAEs.--transformer-path is the standard override for the main denoising transformer.--transformer-path or --transformer-weights-path; see quantization.md.--video-dit-path and --audio-dit-path are only for pipelines that split denoisers by modality.--text-encoder-path and --text-encoder-2-path override primary and secondary text encoders.--tokenizer-path, --processor-path, and --image-processor-path are useful when the replacement encoder requires matching preprocessing assets.--scheduler-path is only relevant when the pipeline exposes a scheduler component.--spatial-upsampler-path is mainly for two-stage pipelines such as LTX2TwoStagePipeline.--vocoder-path, --connectors-path, --dual-tower-bridge-path, --image-encoder-path, and --vision-language-encoder-path are only valid for pipelines that expose those components.model_index.json or the native pipeline's registered module name.This section lists example LoRAs that have been explicitly tested and verified with each base model in the SGLang Diffusion pipeline.
Important: LoRAs that are not listed here are not necessarily incompatible. In practice, most standard LoRAs are expected to work, especially those following common Diffusers or SD-style conventions. The entries below simply reflect configurations that have been manually validated by the SGLang team.
| Base Model | Supported LoRAs |
|---|---|
| Wan2.2 | lightx2v/Wan2.2-Distill-Loras |
Cseti/wan2.2-14B-Arcane_Jinx-lora-v1 | |
| Wan2.1 | lightx2v/Wan2.1-Distill-Loras |
| Z-Image-Turbo | tarn59/pixel_art_style_lora_z_image_turbo |
wcde/Z-Image-Turbo-DeJPEG-Lora | |
| Qwen-Image | lightx2v/Qwen-Image-Lightning |
flymy-ai/qwen-image-realism-lora | |
prithivMLmods/Qwen-Image-HeadshotX | |
starsfriday/Qwen-Image-EVA-LoRA | |
| Qwen-Image-Edit | ostris/qwen_image_edit_inpainting |
lightx2v/Qwen-Image-Edit-2511-Lightning | |
| Flux | dvyio/flux-lora-simple-illustration |
XLabs-AI/flux-furry-lora | |
XLabs-AI/flux-RealismLora |