Intel Gaudi

The Intel Gaudi AI accelerator family includes Intel Gaudi 1, Intel Gaudi 2, and Intel Gaudi 3. Each server is equipped with 8 devices, known as Habana Processing Units (HPUs), providing 128GB of memory on Gaudi 3, 96GB on Gaudi 2, and 32GB on the first-gen Gaudi. For more details on the underlying hardware architecture, check out the Gaudi Architecture overview.

Diffusers pipelines can take advantage of HPU acceleration, even if a pipeline hasn't been added to Optimum for Intel Gaudi yet, with the GPU Migration Toolkit.

Call .to("hpu") on your pipeline to move it to a HPU device as shown below for Flux:

import torch
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipeline.to("hpu")

image = pipeline("An image of a squirrel in Picasso style").images[0]

[!TIP] For Gaudi-optimized diffusion pipeline implementations, we recommend using Optimum for Intel Gaudi.