This model was released on 2024-01-19 and added to Hugging Face Transformers on 2024-01-25.

</div>

</div>

Depth Anything

Depth Anything is designed to be a foundation model for monocular depth estimation (MDE). It is jointly trained on labeled and ~62M unlabeled images to enhance the dataset. It uses a pretrained DINOv2 model as an image encoder to inherit its existing rich semantic priors, and DPT as the decoder. A teacher model is trained on unlabeled images to create pseudo-labels. The student model is trained on a combination of the pseudo-labels and labeled images. To improve the student model's performance, strong perturbations are added to the unlabeled images to challenge the student model to learn more visual knowledge from the image.

You can find all the original Depth Anything checkpoints under the Depth Anything collection.

[!TIP] Click on the Depth Anything models in the right sidebar for more examples of how to apply Depth Anything to different vision tasks.

The example below demonstrates how to obtain a depth map with [Pipeline] or the [AutoModel] class.

python

from transformers import pipeline


pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf", device=0)
pipe("http://images.cocodataset.org/val2017/000000039769.jpg")["depth"]

</hfoption> <hfoption id="AutoModel">

python

import requests
import torch
from PIL import Image

from transformers import AutoImageProcessor, AutoModelForDepthEstimation


image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-base-hf")
model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-base-hf", device_map="auto")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = image_processor(images=image, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model(**inputs)

post_processed_output = image_processor.post_process_depth_estimation(
    outputs,
    target_sizes=[(image.height, image.width)],
)
predicted_depth = post_processed_output[0]["predicted_depth"]
depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
depth = depth.detach().cpu().numpy() * 255
Image.fromarray(depth.astype("uint8"))

</hfoption> </hfoptions>

Notes

DepthAnythingV2, released in June 2024, uses the same architecture as Depth Anything and is compatible with all code examples and existing workflows. It uses synthetic data and a larger capacity teacher model to achieve much finer and robust depth predictions.

DepthAnythingConfig

[[autodoc]] DepthAnythingConfig

DepthAnythingForDepthEstimation

[[autodoc]] DepthAnythingForDepthEstimation - forward