Back to Sentence Transformers

Image Search

examples/sentence_transformer/applications/image-search/README.md

5.4.12.8 KB
Original Source

Image Search

SentenceTransformers provides models that allow to embed images and text into the same vector space. This allows to find similar images as well as to implement image search.

Installation

Ensure that you have transformers installed to use the image-text-models and use a recent PyTorch version (tested with PyTorch 1.7.0). Image-Text-Models have been added with SentenceTransformers version 1.0.0. Image-Text-Models are still in an experimental phase.

Usage

SentenceTransformers provides a wrapper for the OpenAI CLIP Model, which was trained on a variety of (image, text)-pairs.

python
from sentence_transformers import SentenceTransformer
from PIL import Image

# Load CLIP model
model = SentenceTransformer("sentence-transformers/clip-ViT-B-32")

# Encode an image:
img_emb = model.encode(Image.open("two_dogs_in_snow.jpg"))

# Encode text descriptions
text_emb = model.encode(
    ["Two dogs in the snow", "A cat on a table", "A picture of London at night"]
)

# Compute similarities
similarity_scores = model.similarity(img_emb, text_emb)
print(similarity_scores)

You can use the CLIP model for:

  • Text-to-Image / Image-To-Text / Image-to-Image / Text-to-Text Search
  • You can fine-tune it on your own image&text data with the regular SentenceTransformers training code.

Examples