docs/diffusion/performance/cache/index.md
SGLang provides two complementary caching strategies for Diffusion Transformer (DiT) models. Both reduce denoising cost by skipping redundant computation, but they operate at different levels.
SGLang supports two complementary caching approaches:
| Strategy | Scope | Mechanism | Best For |
|---|---|---|---|
| Cache-DiT | Block-level | Skip individual transformer blocks dynamically | Advanced, higher speedup |
| TeaCache | Timestep-level | Skip entire denoising steps based on L1 similarity | Simple, built-in |
Cache-DiT provides block-level caching with advanced strategies like DBCache and TaylorSeer. It can achieve up to 1.69x speedup.
See cache_dit.md for detailed configuration.
SGLANG_CACHE_DIT_ENABLED=true \
sglang generate --model-path Qwen/Qwen-Image \
--prompt "A beautiful sunset over the mountains"
TeaCache (Temporal similarity-based caching) accelerates diffusion inference by detecting when consecutive denoising steps are similar enough to skip computation entirely.
See teacache.md for detailed documentation.
For Flux and Qwen models, TeaCache is automatically disabled when CFG is enabled.
:maxdepth: 1
cache_dit
teacache