docs/source/en/conceptual/philosophy.md
🧨 Diffusers provides state-of-the-art pretrained diffusion models across multiple modalities. Its purpose is to serve as a modular toolbox for both inference and training.
We aim at building a library that stands the test of time and therefore take API design very seriously.
In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on PyTorch's Design Principles. Let's go over the most important ones:
accelerate, safetensors, onnx, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages.As PyTorch states, explicit is better than implicit and simple is better than complex. This design philosophy is reflected in multiple parts of the library:
DiffusionPipeline.to to let the user handle device management.For large parts of the library, Diffusers adopts an important design principle of the Transformers library, which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as Don't repeat yourself (DRY). In short, just like Transformers does for modeling files, Diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers. Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable. However, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because:
At Hugging Face, we call this design the single-file policy which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look at this blog post.
In Diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such as DDPM, Stable Diffusion, unCLIP (DALL·E 2) and Imagen all rely on the same diffusion model, the UNet.
Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗. We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it directly on GitHub.
Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consists of three major classes: pipelines, models, and schedulers. Let's walk through more in-detail design decisions for each class.
Pipelines are designed to be easy to use (therefore do not follow Simple over easy 100%), are not feature complete, and should loosely be seen as examples of how to use models and schedulers for inference.
The following design principles are followed:
src/diffusers/pipelines/stable-diffusion. If pipelines share similar functionality, one can make use of the # Copied from mechanism.DiffusionPipeline].model_index.json file, are accessible under the same name as attributes of the pipeline and can be shared between pipelines with DiffusionPipeline.components function.DiffusionPipeline.from_pretrained function.__call__ method. The naming of the __call__ arguments should be shared across all pipelines.Models are designed as configurable toolboxes that are natural extensions of PyTorch's Module class. They only partly follow the single-file policy.
The following design principles are followed:
UNet2DConditionModel] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.src/diffusers/models and every model architecture shall be defined in its file, e.g. unets/unet_2d_condition.py, transformers/transformer_2d.py, etc...attention.py, resnet.py, embeddings.py, etc... Note: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.Module class, and give clear error messages.ModelMixin and ConfigMixin.string "...type" arguments that can easily be extended to new future types instead of boolean is_..._type arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.Schedulers are responsible to guide the denoising process for inference as well as to define a noise schedule for training. They are designed as individual classes with loadable configuration files and strongly follow the single-file policy.
The following design principles are followed:
src/diffusers/schedulers.# Copied from mechanism.SchedulerMixin and ConfigMixin.ConfigMixin.from_config method as explained in detail here.set_num_inference_steps, and a step function. set_num_inference_steps(...) has to be called before every denoising process, i.e. before step(...) is called.timesteps attribute, which is an array of timesteps the model will be called upon.step(...) function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).step function does not expose all the complexity and can be a bit of a "black box".