examples/use_with/roboflow/embeddings.ipynb
With Roboflow Inference, you can calculate image embeddings using CLIP, a popular multimodal embedding model. You can then store these embeddings in Chroma for use in your application.
In this guide, we are going to discuss how to load image embeddings into Chroma. We will discuss:
Roboflow Inference is a scalable server through which you can run fine-tuned object detection, segmentation, and classification models, as well as popular foundation models such as CLIP.
Inference handles all of the complexity associated with running vision models, from managing dependencies to maintaining your environment.
Inference is trusted by enterprises around the world to manage vision models, with the hosted version powering millions of API calls each month.
Inference runs in Docker and provides a HTTP interface through which to retrieve predictions.
We will use Inference to calculate CLIP embeddings for our application.
There are two ways to use Inference:
In this guide, we will use the hosted Inference API.
To load and save image embeddings into Chroma, we first need images to embed. In this guide, we are going to use the COCO 128 dataset, a collection of 128 images from the Microsoft COCO dataset. This dataset is available on Roboflow Universe, a community that has shared more than 250,000 public computer vision datasets.
To download the dataset, visit the COCO 128 web page, click “Download Dataset” and click "show download code" to get a download code:
Here is the download code for the COCO 128 dataset:
!pip install roboflow -q
API_KEY = ""
from roboflow import Roboflow
rf = Roboflow(api_key=API_KEY)
project = rf.workspace("team-roboflow").project("coco-128")
dataset = project.version(2).download("yolov8")
Above, replace the value associated with the API_KEY variable with your Roboflow API key. Learn how to retrieve your Robflow API key.
Now that we have a dataset ready, we can create a vector database and start loading embeddings.
Install the Chroma Python client and supervision, which we will use to open images in this notebook, with the following command:
!pip install chromadb supervision -q
Then, run the code below to calculate CLIP vectors for images in your dataset:
import chromadb
import os
from chromadb.utils.data_loaders import ImageLoader
from chromadb.utils.embedding_functions import RoboflowEmbeddingFunction
import uuid
import cv2
import supervision as sv
SERVER_URL = "https://infer.roboflow.com"
ef = RoboflowEmbeddingFunction(API_KEY, api_url = SERVER_URL)
client = chromadb.PersistentClient(path="database")
data_loader = ImageLoader()
collection = client.create_collection(name="images_db2", embedding_function=ef, data_loader=data_loader, metadata={"hnsw:space": "cosine"})
IMAGE_DIR = dataset.location + "/train/images"
documents = [os.path.join(IMAGE_DIR, img) for img in os.listdir(IMAGE_DIR)]
uris = [os.path.join(IMAGE_DIR, img) for img in os.listdir(IMAGE_DIR)]
ids = [str(uuid.uuid4()) for _ in range(len(documents))]
collection.add(
uris=uris,
ids=ids,
metadatas=[{"file": file} for file in documents]
)
If you have downloaded custom images from a source other than the Roboflow snippet earlier in this notebook, replace IMAGE_DIR with the folder where your images are stored.
In this code snippet, we create a new Chroma database called images. Our database will use cosine similarity for embedding comparisons.
We calculate CLIP embeddings for all images in the COCO128/train/images folder using Inference. We save the embeddings in Chroma using the collection.add() method.
We store the file names associated with each image in the documents variable, and embeddings in embeddings.
If you want to use the hosted version of Roboflow Inference to calculate embeddings, replace the SERVER_URL value with https://infer.roboflow.com. We use the RoboflowEmbeddingFunction, built in to Chroma, to interact with Inference.
Run the script above to calculate embeddings for a folder of images and save them in your database.
We now have a vector database that contains some embeddings. Great! Let’s move on to the fun part: running a search query on our database.
To run a search query, we need a text embedding of a query. For example, if we want to find vegetables in our collection of 128 images from the COCO dataset, we need to have a text embedding for the search phrase “baseball”.
To calculate a text embedding, we can use Inference through the embedding function we defined earlier:
query = "baseball"
results = collection.query(
n_results=3,
query_texts=query
)
top_result = results["metadatas"][0][0]["file"]
sv.plot_image(cv2.imread(top_result))
Our code returns the name of the image with the most similar embedding to the embedding of our text query.
The top result is an image of a child holding a baseball glove in a park. Chroma successfully returned an image that matched our prompt.