Back to Llama Index

CohereAI Embeddings

docs/examples/embeddings/cohereai.ipynb

0.14.217.7 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/embeddings/cohereai.ipynb" target="_parent"></a>

CohereAI Embeddings

Cohere Embed is the first embedding model that natively supports float, int8, binary and ubinary embeddings.

  1. v3 models support all embedding types while v2 models support only float embedding type.
  2. The default embedding_type is float with LlamaIndex. You can customize it for v3 models using parameter embedding_type.

In this notebook, we will demonstrate using Cohere Embeddings with different models, input_types and embedding_types.

Refer to their main blog post for more details on Cohere int8 & binary Embeddings.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python
%pip install llama-index-llms-cohere
%pip install llama-index-embeddings-cohere
python
!pip install llama-index
python
# Initilise with your api key
import os

cohere_api_key = "YOUR COHERE API KEY"
os.environ["COHERE_API_KEY"] = cohere_api_key

With latest embed-english-v3.0 embeddings.

  • input_type="search_document": Use this for texts (documents) you want to store in your vector database

  • input_type="search_query": Use this for search queries to find the most relevant documents in your vector database

The default embedding_type is float.

python
from llama_index.embeddings.cohere import CohereEmbedding

# with input_typ='search_query'
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_query",
)

embeddings = embed_model.get_text_embedding("Hello CohereAI!")

print(len(embeddings))
print(embeddings[:5])
python
# with input_type = 'search_document'
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_document",
)

embeddings = embed_model.get_text_embedding("Hello CohereAI!")

print(len(embeddings))
print(embeddings[:5])
Let's check With int8 embedding_type
python
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_query",
    embedding_type="int8",
)

embeddings = embed_model.get_text_embedding("Hello CohereAI!")

print(len(embeddings))
print(embeddings[:5])
With binary embedding_type
python
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_query",
    embedding_type="binary",
)

embeddings = embed_model.get_text_embedding("Hello CohereAI!")

print(len(embeddings))
print(embeddings[:5])

With old embed-english-v2.0 embeddings.

v2 models support by default float embedding_type.

python
embed_model = CohereEmbedding(
    api_key=cohere_api_key, model_name="embed-english-v2.0"
)

embeddings = embed_model.get_text_embedding("Hello CohereAI!")

print(len(embeddings))
print(embeddings[:5])

Now with latest embed-english-v3.0 embeddings,

let's use

  1. input_type=search_document to build index
  2. input_type=search_query to retrive relevant context.

We will experiment with int8 embedding_type.

python
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

from llama_index.llms.cohere import Cohere
from llama_index.core.response.notebook_utils import display_source_node

from IPython.display import Markdown, display

Download Data

python
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Load Data

python
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

With int8 embedding_type

Build index with input_type = 'search_document'

python
llm = Cohere(model="command-nightly", api_key=cohere_api_key)
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_document",
    embedding_type="int8",
)

index = VectorStoreIndex.from_documents(
    documents=documents, embed_model=embed_model
)

Build retriever with input_type = 'search_query'

python
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_query",
    embedding_type="int8",
)

search_query_retriever = index.as_retriever()

search_query_retrieved_nodes = search_query_retriever.retrieve(
    "What happened in the summer of 1995?"
)
python
for n in search_query_retrieved_nodes:
    display_source_node(n, source_length=2000)

With float embedding_type

Build index with input_type = 'search_document'

python
llm = Cohere(model="command-nightly", api_key=cohere_api_key)
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_document",
    embedding_type="float",
)

index = VectorStoreIndex.from_documents(
    documents=documents, embed_model=embed_model
)

Build retriever with input_type = 'search_query'

python
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_query",
    embedding_type="float",
)

search_query_retriever = index.as_retriever()

search_query_retrieved_nodes = search_query_retriever.retrieve(
    "What happened in the summer of 1995?"
)
python
for n in search_query_retrieved_nodes:
    display_source_node(n, source_length=2000)

With binary embedding_type.

Build index with input_type = 'search_document'

python
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_document",
    embedding_type="binary",
)

index = VectorStoreIndex.from_documents(
    documents=documents, embed_model=embed_model
)

Build retriever with input_type = 'search_query'

python
embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
    input_type="search_query",
    embedding_type="binary",
)

search_query_retriever = index.as_retriever()

search_query_retrieved_nodes = search_query_retriever.retrieve(
    "What happened in the summer of 1995?"
)
python
for n in search_query_retrieved_nodes:
    display_source_node(n, source_length=2000)
The retrieved chunks are certainly different with binary embedding type compared to float and int8. It would be interesting to do retrieval evaluation for your RAG pipeline in using float/int8/binary/ubinary embeddings.

Text-Image Embeddings

Cohere now support multi-modal embedding model where both text and image are in same embedding space.

python
from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("../data/images/prometheus_paper_card.png")
plt.imshow(img)
python
from llama_index.embeddings.cohere import CohereEmbedding

embed_model = CohereEmbedding(
    api_key=cohere_api_key,
    model_name="embed-english-v3.0",
)
Image Embeddings
python
embeddings = embed_model.get_image_embedding(
    "../data/images/prometheus_paper_card.png"
)

print(len(embeddings))
print(embeddings[:5])
Text Embeddings
python
embeddings = embed_model.get_text_embedding("prometheus evaluation model")

print(len(embeddings))
print(embeddings[:5])