Local Embeddings with OpenVINO

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. The OpenVINO™ Runtime supports various hardware devices including x86 and ARM CPUs, and Intel GPUs. It can help to boost deep learning performance in Computer Vision, Automatic Speech Recognition, Natural Language Processing and other common tasks.

Hugging Face embedding model can be supported by OpenVINO through OpenVINOEmbedding or OpenVINOGENAIEmbeddingclass, and OpenClip model can be through OpenVINOClipEmbedding class.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python

%pip install llama-index-embeddings-openvino

python

!pip install llama-index

Model Exporter

It is possible to export your model to the OpenVINO IR format with create_and_save_openvino_model function, and load the model from local folder.

python

from llama_index.embeddings.huggingface_openvino import OpenVINOEmbedding

OpenVINOEmbedding.create_and_save_openvino_model(
    "BAAI/bge-small-en-v1.5", "./bge_ov"
)

Model Loading

If you have an Intel GPU, you can specify device="gpu" to run inference on it.

python

ov_embed_model = OpenVINOEmbedding(model_id_or_path="./bge_ov", device="cpu")

python

embeddings = ov_embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])

Model Loading with OpenVINO GenAI

To avoid the dependencies of PyTorch in runtime, you can load your local embedding model with OpenVINOGENAIEmbeddingclass.

python

%pip install llama-index-embeddings-openvino-genai

python

from llama_index.embeddings.openvino_genai import OpenVINOGENAIEmbedding

ov_embed_model = OpenVINOGENAIEmbedding(model_path="./bge_ov", device="CPU")

python

embeddings = ov_embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])

OpenClip Model Exporter

Class OpenVINOClipEmbedding can support exporting and loading open_clip models with OpenVINO runtime.

python

%pip install open_clip_torch

python

from llama_index.embeddings.huggingface_openvino import (
    OpenVINOClipEmbedding,
)

OpenVINOClipEmbedding.create_and_save_openvino_model(
    "laion/CLIP-ViT-B-32-laion2B-s34B-b79K",
    "ViT-B-32-ov",
)

MultiModal Model Loading

If you have an Intel GPU, you can specify device="GPU" to run inference on it.

python

ov_clip_model = OpenVINOClipEmbedding(
    model_id_or_path="./ViT-B-32-ov", device="CPU"
)

Embed images and queries with OpenVINO

python

from PIL import Image
import requests
from numpy import dot
from numpy.linalg import norm

image_url = "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcStMP8S3VbNCqOQd7QQQcbvC_FLa1HlftCiJw&s"
im = Image.open(requests.get(image_url, stream=True).raw)
print("Image:")
display(im)

im.save("logo.jpg")
image_embeddings = ov_clip_model.get_image_embedding("logo.jpg")
print("Image dim:", len(image_embeddings))
print("Image embed:", image_embeddings[:5])

text_embeddings = ov_clip_model.get_text_embedding(
    "Logo of a pink blue llama on dark background"
)
print("Text dim:", len(text_embeddings))
print("Text embed:", text_embeddings[:5])

cos_sim = dot(image_embeddings, text_embeddings) / (
    norm(image_embeddings) * norm(text_embeddings)
)
print("Cosine similarity:", cos_sim)

For more information refer to: