HunyuanImage2.1

HunyuanImage-2.1 is a 17B text-to-image model that is capable of generating 2K (2048 x 2048) resolution images

HunyuanImage-2.1 comes in the following variants:

model type	model id
HunyuanImage-2.1	hunyuanvideo-community/HunyuanImage-2.1-Diffusers
HunyuanImage-2.1-Distilled	hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers
HunyuanImage-2.1-Refiner	hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers

[!TIP] Caching may also speed up inference by storing and reusing intermediate outputs.

HunyuanImage-2.1

HunyuanImage-2.1 applies Adaptive Projected Guidance (APG) combined with Classifier-Free Guidance (CFG) in the denoising loop. HunyuanImagePipeline has a guider component (read more about Guider) and does not take a guidance_scale parameter at runtime. To change guider-related parameters, e.g., guidance_scale, you can update the guider configuration instead.

python

import torch
from diffusers import HunyuanImagePipeline

pipe = HunyuanImagePipeline.from_pretrained(
    "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", 
    torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")

You can inspect the guider object:

>>> pipe.guider
AdaptiveProjectedMixGuidance {
  "_class_name": "AdaptiveProjectedMixGuidance",
  "_diffusers_version": "0.36.0.dev0",
  "adaptive_projected_guidance_momentum": -0.5,
  "adaptive_projected_guidance_rescale": 10.0,
  "adaptive_projected_guidance_scale": 10.0,
  "adaptive_projected_guidance_start_step": 5,
  "enabled": true,
  "eta": 0.0,
  "guidance_rescale": 0.0,
  "guidance_scale": 3.5,
  "start": 0.0,
  "stop": 1.0,
  "use_original_formulation": false
}

State:
  step: None
  num_inference_steps: None
  timestep: None
  count_prepared: 0
  enabled: True
  num_conditions: 2
  momentum_buffer: None
  is_apg_enabled: False
  is_cfg_enabled: True

To update the guider with a different configuration, use the new() method. For example, to generate an image with guidance_scale=5.0 while keeping all other default guidance parameters:

import torch
from diffusers import HunyuanImagePipeline

pipe = HunyuanImagePipeline.from_pretrained(
    "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", 
    torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")

# Update the guider configuration
pipe.guider = pipe.guider.new(guidance_scale=5.0)

prompt = (
    "A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, "
    "wearing a red knitted scarf and a red beret with the word 'Tencent' on it, holding a paintbrush with a "
    "focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style."
)

image = pipe(
    prompt=prompt, 
    num_inference_steps=50, 
    height=2048, 
    width=2048,
).images[0]
image.save("image.png")

HunyuanImage-2.1-Distilled

use distilled_guidance_scale with the guidance-distilled checkpoint,

import torch
from diffusers import HunyuanImagePipeline
pipe = HunyuanImagePipeline.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

prompt = (
    "A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, "
    "wearing a red knitted scarf and a red beret with the word 'Tencent' on it, holding a paintbrush with a "
    "focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style."
)

out = pipe(
    prompt,
    num_inference_steps=8,
    distilled_guidance_scale=3.25,
    height=2048,
    width=2048,
    generator=generator,
).images[0]

HunyuanImagePipeline

[[autodoc]] HunyuanImagePipeline

all
call

HunyuanImageRefinerPipeline

[[autodoc]] HunyuanImageRefinerPipeline

all
call

HunyuanImagePipelineOutput

[[autodoc]] pipelines.hunyuan_image.pipeline_output.HunyuanImagePipelineOutput