Performance

This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.

Overview

Optimization	Type	Description
Cache-DiT	Caching	Block-level caching with DBCache, TaylorSeer, and SCM
TeaCache	Caching	Timestep-level caching based on temporal similarity
Attention Backends	Kernel	Optimized attention implementations (FlashAttention, SageAttention, etc.)
Profiling	Diagnostics	PyTorch Profiler and Nsight Systems guidance

Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.
TeaCache is timestep-level caching built into SGLang model families.

{toctree}

:maxdepth: 1

attention_backends
cache/index
profiling

For Ring SP benchmark details, see: