docs_new/cookbook/autoregressive/Google/DiffusionGemma.mdx
DiffusionGemma is a uniform-state (renoising) block-diffusion language model from Google. An encoder builds causal context, and a decoder denoises a fixed-length bidirectional canvas of canvas_length tokens. The Gemma4Renoise sampler runs max_denoising_steps reverse steps over the canvas, feeding the previous step's logits back as self-conditioning and emitting the greedy argmax of the processed logits.
Key Features:
Available Models:
<table style={{width: "100%", borderCollapse: "collapse", tableLayout: "fixed"}}> <colgroup> <col style={{width: "40.0%"}} /> <col style={{width: "30.0%"}} /> <col style={{width: "30.0%"}} /> </colgroup> <thead> <tr style={{borderBottom: "2px solid #d55816"}}> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Model</th> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Architecture</th> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Parameters</th> </tr> </thead> <tbody> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>[google/diffusiongemma-26B-A4B-it](https://huggingface.co/google/diffusiongemma-26B-A4B-it)</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>MoE, uniform-state diffusion (text + image)</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>25.2B total / 3.8B active</td> </tr> </tbody> </table>Architecture Specifications:
| Spec | Value |
|---|---|
| Total Parameters | 25.2B |
| Active Parameters | 3.8B |
| Layers | 30 |
| Sliding Window | 1024 tokens |
| Context Length | Up to 256K tokens |
| Canvas Length | 256 |
| Vocabulary Size | 262K |
| Experts | 8 active / 128 total + 1 shared |
| Supported Modalities | Text, Image |
| Vision Encoder | ~550M parameters |
License:
Refer to the model card for license details.
Please refer to the official SGLang installation guide for installation instructions.
The checkpoint ships its own modeling code, so --trust-remote-code is required when serving.
The required runtime settings are applied automatically for Gemma4Renoise (the Triton attention backend, eager mode, and unchunked prefill, needed because the full-attention head_dim is 512 and the canvas uses bidirectional attention), so a default launch works:
sglang serve \
--model-path google/diffusiongemma-26B-A4B-it \
--dllm-algorithm Gemma4Renoise \
--trust-remote-code \
--host 0.0.0.0 \
--port 30000
dLLM-Specific Parameters:
<table style={{width: "100%", borderCollapse: "collapse", tableLayout: "fixed"}}> <thead> <tr style={{borderBottom: "2px solid #d55816"}}> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Parameter</th> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Description</th> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Recommended Value</th> </tr> </thead> <tbody> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`--dllm-algorithm`</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Diffusion decoding algorithm</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>`Gemma4Renoise`</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`--trust-remote-code`</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Required to load the checkpoint's modeling code</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>Always enabled</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>`--dllm-algorithm-config`</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Optional YAML overriding the renoise schedule</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>Checkpoint defaults</td> </tr> </tbody> </table>The attention backend, eager mode, and unchunked prefill are selected automatically for Gemma4Renoise, so they do not need to be passed on the command line.
Sampling is governed by the renoise schedule. Request-level logprobs, penalties, logit_bias, and grammar / structured output (json_schema / regex / ebnf / structural_tag) are not applied and are rejected with a 400. Core sampling controls (temperature, top_k, top_p) are accepted but have no effect. Streaming is block-level: one fully-denoised canvas per chunk.
Gemma4Renoise Config (defaults follow the checkpoint's generation_config.json):
# Number of reverse denoising steps per canvas.
max_denoising_steps: 48
# Optional. Makes the renoise sampling reproducible (also shared across TP ranks).
seed: 1234
sampler_config:
# Entropy budget. Accept the lowest-entropy canvas positions within this bound each step (the rest are re-noised).
entropy_bound: 0.1
# Linear temperature schedule applied over the denoising steps.
temperature_schedule:
t_min: 0.4
t_max: 0.8
# Stop early once the canvas is stable and confident.
stopping_config:
confidence_threshold: 0.005
stability_threshold: 1
Start the server with the command from Section 3.1.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="EMPTY"
)
response = client.chat.completions.create(
model="google/diffusiongemma-26B-A4B-it",
messages=[
{"role": "user", "content": "What are the key differences between TCP and UDP?"}
],
max_tokens=1024
)
print(response.choices[0].message.content)
Streaming emits one fully-denoised canvas per chunk.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="EMPTY"
)
response = client.chat.completions.create(
model="google/diffusiongemma-26B-A4B-it",
messages=[
{"role": "user", "content": "Write a Python function to compute the Fibonacci sequence."}
],
max_tokens=2048,
stream=True
)
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print()
Not benchmarked for speed.
Full test splits, every item scored (no failed-request exclusions). Text MCQ benchmarks use greedy generate-and-parse, MATH uses boxed-answer extraction plus sympy equivalence. MMLU, ARC-Challenge, and MATH-500 are the mean of two independent server launches.
<table style={{width: "100%", borderCollapse: "collapse", tableLayout: "fixed"}}> <colgroup> <col style={{width: "50.0%"}} /> <col style={{width: "50.0%"}} /> </colgroup> <thead> <tr style={{borderBottom: "2px solid #d55816"}}> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Benchmark</th> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Score</th> </tr> </thead> <tbody> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>GSM8K</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>95.4%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>ARC-Challenge</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>91.6%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>HumanEval</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>92.7% pass@1</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>MMLU</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>76.2%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>MMLU-Pro</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>73.7%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>GSM-Symbolic</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>92.2%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>MATH-500</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>72.1%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>AIME-2026</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>10.0%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>HMMT-Feb-2025</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>10.0%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>GPQA-main</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>59.2%</td> </tr> </tbody> </table>Multimodal, full standard split per task (MMMU / MMMU-Pro / MMStar / AI2D as multiple-choice, MathVista testmini, DocVQA by ANLS, ChartQA by relaxed accuracy):
<table style={{width: "100%", borderCollapse: "collapse", tableLayout: "fixed"}}> <colgroup> <col style={{width: "50.0%"}} /> <col style={{width: "50.0%"}} /> </colgroup> <thead> <tr style={{borderBottom: "2px solid #d55816"}}> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Multimodal benchmark</th> <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Score</th> </tr> </thead> <tbody> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>MMMU (val, MC)</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>64.9%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>MMMU-Pro (standard 10-opt, MC)</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>57.3%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>MathVista (testmini)</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>68.4%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>DocVQA (val)</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>85.9%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>ChartQA (test)</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>61.7%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>AI2D (test)</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>78.7%</td> </tr> <tr> <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}>MMStar (val)</td> <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>65.9%</td> </tr> </tbody> </table>