Vertex AI Multimodal Embedding

Uses APPLICATION_DEFAULT_CREDENTIALS if no credentials is specified.

python

from llama_index.embeddings.vertex import VertexMultiModalEmbedding

embed_model = VertexMultiModalEmbedding(
    project="speedy-atom-413006", location="us-central1"
)

python

image_url = "https://upload.wikimedia.org/wikipedia/commons/4/43/Cute_dog.jpg"

Download this image to data/test-image.jpg

python

from IPython.core.display import Image

display(Image(url=image_url, width=500))

python

result = embed_model.get_image_embedding("data/test-image.jpg")

python

result[:10]

python

text_result = embed_model.get_text_embedding(
    "a brown and white puppy laying in the grass with purple daisies in the background"
)

python

text_result_2 = embed_model.get_text_embedding("airplanes in the sky")

We expect that a similar description to the image will yield a higher similarity result

python

embed_model.similarity(result, text_result)

python

embed_model.similarity(result, text_result_2)