Caching Acceleration

SGLang provides two complementary caching strategies for Diffusion Transformer (DiT) models. Both reduce denoising cost by skipping redundant computation, but they operate at different levels.

Overview

SGLang supports two complementary caching approaches:

Strategy	Scope	Mechanism	Best For
Cache-DiT	Block-level	Skip individual transformer blocks dynamically	Advanced, higher speedup
TeaCache	Timestep-level	Skip entire denoising steps based on L1 similarity	Simple, built-in

Cache-DiT

Cache-DiT provides block-level caching with advanced strategies like DBCache and TaylorSeer. It can achieve up to 1.69x speedup.

See cache_dit.md for detailed configuration.

Quick Start

bash

SGLANG_CACHE_DIT_ENABLED=true \
sglang generate --model-path Qwen/Qwen-Image \
    --prompt "A beautiful sunset over the mountains"

Key Features

DBCache: Dynamic block-level caching based on residual differences
TaylorSeer: Taylor expansion-based calibration for optimized caching
SCM: Step-level computation masking for additional speedup

TeaCache

TeaCache (Temporal similarity-based caching) accelerates diffusion inference by detecting when consecutive denoising steps are similar enough to skip computation entirely.

See teacache.md for detailed documentation.

Quick Overview

Tracks L1 distance between modulated inputs across timesteps
When accumulated distance is below threshold, reuses cached residual
Supports CFG with separate positive/negative caches

Supported Models

Wan (wan2.1, wan2.2)
Hunyuan (HunyuanVideo)
Z-Image

For Flux and Qwen models, TeaCache is automatically disabled when CFG is enabled.

{toctree}

:maxdepth: 1

cache_dit
teacache

Caching Acceleration

Caching Acceleration

Overview

Cache-DiT

Quick Start

Key Features

TeaCache

Quick Overview

Supported Models

References