docs_new/docs/sglang-diffusion/dynamic_batching.mdx
Dynamic batching is an opt-in SGLang-Diffusion serving mode that merges compatible queued requests into one native pipeline batch. It is separate from LLM continuous batching and tokenizer batching.
Use it for concurrent T2I or T2V traffic with the same model and sampling shape. Keep singleton serving for latency-sensitive or highly mixed traffic.
Dynamic batching is disabled by default with --batching-max-size 1.
sglang serve \
--model-path black-forest-labs/FLUX.1-dev \
--port 30010 \
--batching-mode dynamic \
--batching-max-size 8 \
--batching-delay-ms 5 \
--enable-batching-metrics
For request formats, see the OpenAI-Compatible API.
Use --batching-config /path/to/batching_config.json to load JSON rules when a model or resolution needs a lower cap than --batching-max-size:
{
"schema_version": 1,
"rules": [
{
"model_contains": "Qwen-Image",
"resolution": "1024x1024",
"max_batch_size": 1
}
]
}
An initial implementation of dynamic batching for T2I and T2V models can be found in #18764. The current compatibility grid is below and will be updated as more coverage is added. See Supported Models for full model IDs.
✅ means supported, ❌ means not currently supported, ? means untested, and - means not applicable.
--enable-batching-metrics to inspect realized batches:Dynamic batch dispatch: size=2/8, user_max=8, queue_wait=5.12ms, stop_reason=delay
Dynamic batch dispatch: size=1/8, user_max=8, queue_wait=0.04ms, stop_reason=config_cap:1
Dynamic batch stats (last 5 dispatches): avg_size=2.80, merged_rate=60.0%, full_rate=20.0%, utilization=35.0%, wait_avg=3.21ms, wait_p95=5.12ms, top_rejects=none