examples/sentence_transformer/applications/image-search/README.md
SentenceTransformers provides models that allow to embed images and text into the same vector space. This allows to find similar images as well as to implement image search.
Ensure that you have transformers installed to use the image-text-models and use a recent PyTorch version (tested with PyTorch 1.7.0). Image-Text-Models have been added with SentenceTransformers version 1.0.0. Image-Text-Models are still in an experimental phase.
SentenceTransformers provides a wrapper for the OpenAI CLIP Model, which was trained on a variety of (image, text)-pairs.
from sentence_transformers import SentenceTransformer
from PIL import Image
# Load CLIP model
model = SentenceTransformer("sentence-transformers/clip-ViT-B-32")
# Encode an image:
img_emb = model.encode(Image.open("two_dogs_in_snow.jpg"))
# Encode text descriptions
text_emb = model.encode(
["Two dogs in the snow", "A cat on a table", "A picture of London at night"]
)
# Compute similarities
similarity_scores = model.similarity(img_emb, text_emb)
print(similarity_scores)
You can use the CLIP model for: