README - Rerun — ContextQMD

This example project combines several popular computer vision methods and uses Rerun to visualize the results and how the pieces fit together.

Visual project walkthrough

By combining MetaAI's Segment Anything Model (SAM) and Multiview Compressive Coding (MCC) we can get a 3D object from a single image.

https://vimeo.com/865973817?autoplay=1&loop=1&autopause=0&background=1&muted=1&ratio=10000:8133

The basic idea is to use SAM to create a generic object mask so we can exclude the background.

https://vimeo.com/865973836?autoplay=1&loop=1&autopause=0&background=1&muted=1&ratio=10000:7941

The next step is to generate a depth image. Here we use the awesome ZoeDepth to get realistic depth from the color image.

https://vimeo.com/865973850?autoplay=1&loop=1&autopause=0&background=1&muted=1&ratio=10000:7941

With depth, color, and an object mask we have everything needed to create a colored point cloud of the object from a single view

https://vimeo.com/865973862?autoplay=1&loop=1&autopause=0&background=1&muted=1&ratio=10000:11688

MCC encodes the colored points and then creates a reconstruction by sweeping through the volume, querying the network for occupancy and color at each point.

https://vimeo.com/865973880?autoplay=1&loop=1&autopause=0&background=1&muted=1&ratio=1:1

This is a really great example of how a lot of cool solutions are built these days; by stringing together more targeted pre-trained models. The details of the three building blocks can be found in the respective papers:

Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick
Multiview Compressive Coding for 3D Reconstruction by Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, and Georgia Gkioxari
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth by Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias Müller