Back to Diffusers

Licensed under the Apache License, Version 2.0 (the "License");

docs/source/en/api/pipelines/easyanimate.md

0.37.14.4 KB
Original Source
<!--Copyright 2025 The HuggingFace Team. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -->

EasyAnimate

EasyAnimate by Alibaba PAI.

The description from it's GitHub page: EasyAnimate is a pipeline based on the transformer architecture, designed for generating AI images and videos, and for training baseline models and Lora models for Diffusion Transformer. We support direct prediction from pre-trained EasyAnimate models, allowing for the generation of videos with various resolutions, approximately 6 seconds in length, at 8fps (EasyAnimateV5.1, 1 to 49 frames). Additionally, users can train their own baseline and Lora models for specific style transformations.

This pipeline was contributed by bubbliiiing. The original codebase can be found here. The original weights can be found under hf.co/alibaba-pai.

There are two official EasyAnimate checkpoints for text-to-video and video-to-video.

checkpointsrecommended inference dtype
alibaba-pai/EasyAnimateV5.1-12b-zhtorch.float16
alibaba-pai/EasyAnimateV5.1-12b-zh-InPtorch.float16

There is one official EasyAnimate checkpoints available for image-to-video and video-to-video.

checkpointsrecommended inference dtype
alibaba-pai/EasyAnimateV5.1-12b-zh-InPtorch.float16

There are two official EasyAnimate checkpoints available for control-to-video.

checkpointsrecommended inference dtype
alibaba-pai/EasyAnimateV5.1-12b-zh-Controltorch.float16
alibaba-pai/EasyAnimateV5.1-12b-zh-Control-Cameratorch.float16

For the EasyAnimateV5.1 series:

  • Text-to-video (T2V) and Image-to-video (I2V) works for multiple resolutions. The width and height can vary from 256 to 1024.
  • Both T2V and I2V models support generation with 1~49 frames and work best at this value. Exporting videos at 8 FPS is recommended.

Quantization

Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.

Refer to the Quantization overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [EasyAnimatePipeline] for inference with bitsandbytes.

py
import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, EasyAnimateTransformer3DModel, EasyAnimatePipeline
from diffusers.utils import export_to_video

quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
transformer_8bit = EasyAnimateTransformer3DModel.from_pretrained(
    "alibaba-pai/EasyAnimateV5.1-12b-zh",
    subfolder="transformer",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
)

pipeline = EasyAnimatePipeline.from_pretrained(
    "alibaba-pai/EasyAnimateV5.1-12b-zh",
    transformer=transformer_8bit,
    torch_dtype=torch.float16,
    device_map="balanced",
)

prompt = "A cat walks on the grass, realistic style."
negative_prompt = "bad detailed"
video = pipeline(prompt=prompt, negative_prompt=negative_prompt, num_frames=49, num_inference_steps=30).frames[0]
export_to_video(video, "cat.mp4", fps=8)

EasyAnimatePipeline

[[autodoc]] EasyAnimatePipeline

  • all
  • call

EasyAnimatePipelineOutput

[[autodoc]] pipelines.easyanimate.pipeline_output.EasyAnimatePipelineOutput