docs/source/en/api/pipelines/llada2.md
LLaDA2 is a family of discrete diffusion language models that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement steps.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from diffusers import BlockRefinementScheduler, LLaDA2Pipeline
model_id = "inclusionAI/LLaDA2.1-mini"
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, dtype=torch.bfloat16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
scheduler = BlockRefinementScheduler()
pipe = LLaDA2Pipeline(model=model, scheduler=scheduler, tokenizer=tokenizer)
output = pipe(
prompt="Write a short poem about the ocean.",
gen_length=256,
block_length=32,
num_inference_steps=32,
threshold=0.7,
editing_threshold=0.5,
max_post_steps=16,
temperature=0.0,
)
print(output.texts[0])
Callbacks run after each refinement step. Pass callback_on_step_end_tensor_inputs to select which tensors are
included in callback_kwargs. In the current implementation, block_x (the sequence window being refined) and
transfer_index (mask-filling commit mask) are provided; return {"block_x": ...} from the callback to replace the
window.
def on_step_end(pipe, step, timestep, callback_kwargs):
block_x = callback_kwargs["block_x"]
# Inspect or modify `block_x` here.
return {"block_x": block_x}
out = pipe(
prompt="Write a short poem.",
callback_on_step_end=on_step_end,
callback_on_step_end_tensor_inputs=["block_x"],
)
LLaDA2.1 models support two modes:
| Mode | threshold | editing_threshold | max_post_steps |
|---|---|---|---|
| Quality | 0.7 | 0.5 | 16 |
| Speed | 0.5 | None | 16 |
Pass editing_threshold=None, 0.0, or a negative value to turn off post-mask editing.
For LLaDA2.0 models, disable editing by passing editing_threshold=None or 0.0.
For all models: block_length=32, temperature=0.0, num_inference_steps=32.
[[autodoc]] LLaDA2Pipeline - all - call
[[autodoc]] pipelines.LLaDA2PipelineOutput