Marigold Computer Vision

Marigold was proposed in Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation, a CVPR 2024 Oral paper by Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, and Konrad Schindler. The core idea is to repurpose the generative prior of Text-to-Image Latent Diffusion Models (LDMs) for traditional computer vision tasks. This approach was explored by fine-tuning Stable Diffusion for Monocular Depth Estimation, as demonstrated in the teaser above.

Marigold was later extended in the follow-up paper, Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis, authored by Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler. This work expanded Marigold to support new modalities such as Surface Normals and Intrinsic Image Decomposition (IID), introduced a training protocol for Latent Consistency Models (LCM), and demonstrated High-Resolution (HR) processing capability.

[!TIP] The early Marigold models (v1-0 and earlier) were optimized for best results with at least 10 inference steps. LCM models were later developed to enable high-quality inference in just 1 to 4 steps. Marigold models v1-1 and later use the DDIM scheduler to achieve optimal results in as few as 1 to 4 steps.

Available Pipelines

Each pipeline is tailored for a specific computer vision task, processing an input RGB image and generating a corresponding prediction. Currently, the following computer vision tasks are implemented:

Pipeline	Recommended Model Checkpoints	Spaces (Interactive Apps)	Predicted Modalities
MarigoldDepthPipeline	prs-eth/marigold-depth-v1-1	Depth Estimation	Depth, Disparity
MarigoldNormalsPipeline	prs-eth/marigold-normals-v1-1	Surface Normals Estimation	Surface normals
MarigoldIntrinsicsPipeline	prs-eth/marigold-iid-appearance-v1-1,
prs-eth/marigold-iid-lighting-v1-1	Intrinsic Image Decomposition	Albedo, Materials, Lighting

Available Checkpoints

All original checkpoints are available under the PRS-ETH organization on Hugging Face. They are designed for use with diffusers pipelines and the original codebase, which can also be used to train new model checkpoints. The following is a summary of the recommended checkpoints, all of which produce reliable results with 1 to 4 steps.

Checkpoint	Modality	Comment
prs-eth/marigold-depth-v1-1	Depth	Affine-invariant depth prediction assigns each pixel a value between 0 (near plane) and 1 (far plane), with both planes determined by the model during inference.
prs-eth/marigold-normals-v0-1	Normals	The surface normals predictions are unit-length 3D vectors in the screen space camera, with values in the range from -1 to 1.
prs-eth/marigold-iid-appearance-v1-1	Intrinsics	InteriorVerse decomposition is comprised of Albedo and two BRDF material properties: Roughness and Metallicity.
prs-eth/marigold-iid-lighting-v1-1	Intrinsics	HyperSim decomposition of an image $I$ is comprised of Albedo $A$, Diffuse shading $S$, and Non-diffuse residual $R$: $I = A*S+R$.

[!TIP] Make sure to check out the Schedulers guide to learn how to explore the tradeoff between scheduler speed and quality, and see the reuse components across pipelines section to learn how to efficiently load the same components into multiple pipelines. Also, to know more about reducing the memory usage of this pipeline, refer to the ["Reduce memory usage"] section here.

[!WARNING] Marigold pipelines were designed and tested with the scheduler embedded in the model checkpoint. The optimal number of inference steps varies by scheduler, with no universal value that works best across all cases. To accommodate this, the num_inference_steps parameter in the pipeline's __call__ method defaults to None (see the API reference). Unless set explicitly, it inherits the value from the default_denoising_steps field in the checkpoint configuration file (model_index.json). This ensures high-quality predictions when invoking the pipeline with only the image argument.

Marigold Depth Prediction API

[[autodoc]] MarigoldDepthPipeline - call

[[autodoc]] pipelines.marigold.pipeline_marigold_depth.MarigoldDepthOutput

[[autodoc]] pipelines.marigold.marigold_image_processing.MarigoldImageProcessor.visualize_depth

Marigold Normals Estimation API

[[autodoc]] MarigoldNormalsPipeline - call

[[autodoc]] pipelines.marigold.pipeline_marigold_normals.MarigoldNormalsOutput

[[autodoc]] pipelines.marigold.marigold_image_processing.MarigoldImageProcessor.visualize_normals

Marigold Intrinsic Image Decomposition API

[[autodoc]] MarigoldIntrinsicsPipeline - call

[[autodoc]] pipelines.marigold.pipeline_marigold_intrinsics.MarigoldIntrinsicsOutput

[[autodoc]] pipelines.marigold.marigold_image_processing.MarigoldImageProcessor.visualize_intrinsics