Back to Llama Index

VoyageAI Embeddings

docs/examples/embeddings/voyageai.ipynb

0.14.213.8 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/embeddings/voyageai.ipynb" target="_parent"></a>

VoyageAI Embeddings

New VoyageAI Embedding models natively supports float, int8, binary and ubinary embeddings. Please check output_dtype description here for more details.

In this notebook, we will demonstrate using VoyageAI Embeddings with different models, input_types and embedding_types.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-voyageai
python
!pip install llama-index

With latest voyage-3 embeddings.

The default embedding_type is float.

python
from llama_index.embeddings.voyageai import VoyageEmbedding

# with input_typ='search_query'
embed_model = VoyageEmbedding(
    voyage_api_key="<YOUR_VOYAGE_API_KEY>",
    model_name="voyage-3",
)

embeddings = embed_model.get_text_embedding("Hello VoyageAI!")

print(len(embeddings))
print(embeddings[:5])
Let's check With int8 embedding_type with voyage-3-large model
python
embed_model = VoyageEmbedding(
    voyage_api_key="<YOUR_VOYAGE_API_KEY>",
    model_name="voyage-3-large",
    output_dtype="int8",
    truncation=False,
)

embeddings = embed_model.get_text_embedding("Hello VoyageAI!")

print(len(embeddings))
print(embeddings[:5])

Check voyage-3-large embeddings in depth

We will experiment with int8 embedding_type.

python
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

from llama_index.llms.openai import OpenAI
from llama_index.core.response.notebook_utils import display_source_node

from IPython.display import Markdown, display

Download Data

python
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Load Data

python
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

With int8 embedding_type

Build index

python
llm = OpenAI(
    model="command-nightly",
    api_key="<YOUR_OPENAI_API_KEY>",
)
embed_model = VoyageEmbedding(
    voyage_api_key="<YOUR_VOYAGE_API_KEY>",
    model_name="voyage-3-large",
    embedding_type="int8",
)

index = VectorStoreIndex.from_documents(
    documents=documents, embed_model=embed_model
)

Build retriever

python
search_query_retriever = index.as_retriever()

search_query_retrieved_nodes = search_query_retriever.retrieve(
    "What happened in the summer of 1995?"
)
python
for n in search_query_retrieved_nodes:
    display_source_node(n, source_length=2000)

Text-Image Embeddings

VoyageAI now support multi-modal embedding model where both text and image are in same embedding space.

python
from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("./data/images/prometheus_paper_card.png")
plt.imshow(img)
python
embed_model = VoyageEmbedding(
    voyage_api_key="<YOUR_VOYAGE_API_KEY>",
    model_name="voyage-multimodal-3",
    truncation=False,
)
Image Embeddings
python
embeddings = embed_model.get_image_embedding(
    "./data/images/prometheus_paper_card.png"
)

print(len(embeddings))
print(embeddings[:5])
Text Embeddings
python
embeddings = embed_model.get_text_embedding("prometheus evaluation model")

print(len(embeddings))
print(embeddings[:5])