Integrate image embedders

Image embedders allow embedding images into a high-dimensional feature vector representing the semantic meaning of an image, which can then be compared with the feature vector of other images to evaluate their semantic similarity.

As opposed to image search, the image embedder allows computing the similarity between images on-the-fly instead of searching through a predefined index built from a corpus of images.

Use the Task Library ImageEmbedder API to deploy your custom image embedder into your mobile apps.

Key features of the ImageEmbedder API

Input image processing, including rotation, resizing, and color space conversion.
Region of interest of the input image.
Built-in utility function to compute the cosine similarity between feature vectors.

Supported image embedder models

The following models are guaranteed to be compatible with the ImageEmbedder API.

Feature vector models from the Google Image Modules collection on TensorFlow Hub.
Custom models that meet the model compatibility requirements.

Run inference in C++

c++

// Initialization
ImageEmbedderOptions options:
options.mutable_model_file_with_metadata()->set_file_name(model_path);
options.set_l2_normalize(true);
std::unique_ptr<ImageEmbedder> image_embedder = ImageEmbedder::CreateFromOptions(options).value();

// Create input frame_buffer_1 and frame_buffer_2 from your inputs `image_data1`, `image_data2`, `image_dimension1` and `image_dimension2`.
// See more information here: tensorflow_lite_support/cc/task/vision/utils/frame_buffer_common_utils.h
std::unique_ptr<FrameBuffer> frame_buffer_1 = CreateFromRgbRawBuffer(
      image_data1, image_dimension1);
std::unique_ptr<FrameBuffer> frame_buffer_2 = CreateFromRgbRawBuffer(
      image_data2, image_dimension2);

// Run inference on two images.
const EmbeddingResult result_1 = image_embedder->Embed(*frame_buffer_1);
const EmbeddingResult result_2 = image_embedder->Embed(*frame_buffer_2);

// Compute cosine similarity.
double similarity = ImageEmbedder::CosineSimilarity(
    result_1.embeddings[0].feature_vector(),
    result_2.embeddings[0].feature_vector());

See the source code for more options to configure ImageEmbedder.

Run inference in Python

Step 1: Install TensorFlow Lite Support Pypi package.

You can install the TensorFlow Lite Support Pypi package using the following command:

pip install tflite-support

Step 2: Using the model

python

from tflite_support.task import vision

# Initialization.
image_embedder = vision.ImageEmbedder.create_from_file(model_path)

# Run inference on two images.
image_1 = vision.TensorImage.create_from_file('/path/to/image1.jpg')
result_1 = image_embedder.embed(image_1)
image_2 = vision.TensorImage.create_from_file('/path/to/image2.jpg')
result_2 = image_embedder.embed(image_2)

# Compute cosine similarity.
feature_vector_1 = result_1.embeddings[0].feature_vector
feature_vector_2 = result_2.embeddings[0].feature_vector
similarity = image_embedder.cosine_similarity(
    result_1.embeddings[0].feature_vector, result_2.embeddings[0].feature_vector)

See the source code for more options to configure ImageEmbedder.

Example results

Cosine similarity between normalized feature vectors return a score between -1 and 1. Higher is better, i.e. a cosine similarity of 1 means the two vectors are identical.

Cosine similarity: 0.954312

Try out the simple CLI demo tool for ImageEmbedder with your own model and test data.

Model compatibility requirements

The ImageEmbedder API expects a TFLite model with optional, but strongly recommended TFLite Model Metadata.

The compatible image embedder models should meet the following requirements:

An input image tensor (kTfLiteUInt8/kTfLiteFloat32)
- image input of size [batch x height x width x channels].
- batch inference is not supported (batch is required to be 1).
- only RGB inputs are supported (channels is required to be 3).
- if type is kTfLiteFloat32, NormalizationOptions are required to be attached to the metadata for input normalization.
At least one output tensor (kTfLiteUInt8/kTfLiteFloat32)
- with N components corresponding to the N dimensions of the returned feature vector for this output layer.
- Either 2 or 4 dimensions, i.e. [1 x N] or [1 x 1 x 1 x N].