VoyageAI Embeddings

New VoyageAI Embedding models natively supports float, int8, binary and ubinary embeddings. Please check output_dtype description here for more details.

In this notebook, we will demonstrate using VoyageAI Embeddings with different models, input_types and embedding_types.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python

%pip install llama-index-llms-openai
%pip install llama-index-embeddings-voyageai

python

!pip install llama-index

With latest `voyage-3` embeddings.

The default embedding_type is float.

python

from llama_index.embeddings.voyageai import VoyageEmbedding

# with input_typ='search_query'
embed_model = VoyageEmbedding(
    voyage_api_key="<YOUR_VOYAGE_API_KEY>",
    model_name="voyage-3",
)

embeddings = embed_model.get_text_embedding("Hello VoyageAI!")

print(len(embeddings))
print(embeddings[:5])

Let's check With `int8` embedding_type with `voyage-3-large` model

python

embed_model = VoyageEmbedding(
    voyage_api_key="<YOUR_VOYAGE_API_KEY>",
    model_name="voyage-3-large",
    output_dtype="int8",
    truncation=False,
)

embeddings = embed_model.get_text_embedding("Hello VoyageAI!")

print(len(embeddings))
print(embeddings[:5])

Check `voyage-3-large` embeddings in depth

We will experiment with int8 embedding_type.

python

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

from llama_index.llms.openai import OpenAI
from llama_index.core.response.notebook_utils import display_source_node

from IPython.display import Markdown, display

Download Data

python

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Load Data

python

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

With `int8` embedding_type

Build index

python

llm = OpenAI(
    model="command-nightly",
    api_key="<YOUR_OPENAI_API_KEY>",
)
embed_model = VoyageEmbedding(
    voyage_api_key="<YOUR_VOYAGE_API_KEY>",
    model_name="voyage-3-large",
    embedding_type="int8",
)

index = VectorStoreIndex.from_documents(
    documents=documents, embed_model=embed_model
)

Build retriever

python

search_query_retriever = index.as_retriever()

search_query_retrieved_nodes = search_query_retriever.retrieve(
    "What happened in the summer of 1995?"
)

python

for n in search_query_retrieved_nodes:
    display_source_node(n, source_length=2000)

Text-Image Embeddings

VoyageAI now support multi-modal embedding model where both text and image are in same embedding space.

python

from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("./data/images/prometheus_paper_card.png")
plt.imshow(img)

python

embed_model = VoyageEmbedding(
    voyage_api_key="<YOUR_VOYAGE_API_KEY>",
    model_name="voyage-multimodal-3",
    truncation=False,
)

Image Embeddings

python

embeddings = embed_model.get_image_embedding(
    "./data/images/prometheus_paper_card.png"
)

print(len(embeddings))
print(embeddings[:5])

Text Embeddings

python

embeddings = embed_model.get_text_embedding("prometheus evaluation model")

print(len(embeddings))
print(embeddings[:5])

VoyageAI Embeddings

VoyageAI Embeddings

With latest voyage-3 embeddings.

Let's check With int8 embedding_type with voyage-3-large model

Check voyage-3-large embeddings in depth

Download Data

Load Data

With int8 embedding_type

Build index

Build retriever

Text-Image Embeddings

Image Embeddings

Text Embeddings

With latest `voyage-3` embeddings.

Let's check With `int8` embedding_type with `voyage-3-large` model

Check `voyage-3-large` embeddings in depth

With `int8` embedding_type