docs/source/en/model_doc/depth_anything.md
This model was released on 2024-01-19 and added to Hugging Face Transformers on 2024-01-25.
<div style="float: right;"> <div class="flex flex-wrap space-x-1"></div>
Depth Anything is designed to be a foundation model for monocular depth estimation (MDE). It is jointly trained on labeled and ~62M unlabeled images to enhance the dataset. It uses a pretrained DINOv2 model as an image encoder to inherit its existing rich semantic priors, and DPT as the decoder. A teacher model is trained on unlabeled images to create pseudo-labels. The student model is trained on a combination of the pseudo-labels and labeled images. To improve the student model's performance, strong perturbations are added to the unlabeled images to challenge the student model to learn more visual knowledge from the image.
You can find all the original Depth Anything checkpoints under the Depth Anything collection.
[!TIP] Click on the Depth Anything models in the right sidebar for more examples of how to apply Depth Anything to different vision tasks.
The example below demonstrates how to obtain a depth map with [Pipeline] or the [AutoModel] class.
from transformers import pipeline
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf", device=0)
pipe("http://images.cocodataset.org/val2017/000000039769.jpg")["depth"]
import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForDepthEstimation
image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-base-hf")
model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-base-hf", device_map="auto")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = image_processor(images=image, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model(**inputs)
post_processed_output = image_processor.post_process_depth_estimation(
outputs,
target_sizes=[(image.height, image.width)],
)
predicted_depth = post_processed_output[0]["predicted_depth"]
depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
depth = depth.detach().cpu().numpy() * 255
Image.fromarray(depth.astype("uint8"))
[[autodoc]] DepthAnythingConfig
[[autodoc]] DepthAnythingForDepthEstimation - forward