docs/examples/embeddings/voyageai.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/embeddings/voyageai.ipynb" target="_parent"></a>
New VoyageAI Embedding models natively supports float, int8, binary and ubinary embeddings. Please check output_dtype description here for more details.
In this notebook, we will demonstrate using VoyageAI Embeddings with different models, input_types and embedding_types.
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-voyageai
!pip install llama-index
voyage-3 embeddings.The default embedding_type is float.
from llama_index.embeddings.voyageai import VoyageEmbedding
# with input_typ='search_query'
embed_model = VoyageEmbedding(
voyage_api_key="<YOUR_VOYAGE_API_KEY>",
model_name="voyage-3",
)
embeddings = embed_model.get_text_embedding("Hello VoyageAI!")
print(len(embeddings))
print(embeddings[:5])
int8 embedding_type with voyage-3-large modelembed_model = VoyageEmbedding(
voyage_api_key="<YOUR_VOYAGE_API_KEY>",
model_name="voyage-3-large",
output_dtype="int8",
truncation=False,
)
embeddings = embed_model.get_text_embedding("Hello VoyageAI!")
print(len(embeddings))
print(embeddings[:5])
voyage-3-large embeddings in depthWe will experiment with int8 embedding_type.
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core.response.notebook_utils import display_source_node
from IPython.display import Markdown, display
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
int8 embedding_typellm = OpenAI(
model="command-nightly",
api_key="<YOUR_OPENAI_API_KEY>",
)
embed_model = VoyageEmbedding(
voyage_api_key="<YOUR_VOYAGE_API_KEY>",
model_name="voyage-3-large",
embedding_type="int8",
)
index = VectorStoreIndex.from_documents(
documents=documents, embed_model=embed_model
)
search_query_retriever = index.as_retriever()
search_query_retrieved_nodes = search_query_retriever.retrieve(
"What happened in the summer of 1995?"
)
for n in search_query_retrieved_nodes:
display_source_node(n, source_length=2000)
VoyageAI now support multi-modal embedding model where both text and image are in same embedding space.
from PIL import Image
import matplotlib.pyplot as plt
img = Image.open("./data/images/prometheus_paper_card.png")
plt.imshow(img)
embed_model = VoyageEmbedding(
voyage_api_key="<YOUR_VOYAGE_API_KEY>",
model_name="voyage-multimodal-3",
truncation=False,
)
embeddings = embed_model.get_image_embedding(
"./data/images/prometheus_paper_card.png"
)
print(len(embeddings))
print(embeddings[:5])
embeddings = embed_model.get_text_embedding("prometheus evaluation model")
print(len(embeddings))
print(embeddings[:5])