set up chromadb and jina API key - Chroma

Late Chunking

Late chunking is a technique to leverage the model’s long-context capabilities for generating contextual chunk embeddings. Include late_chunking=True in your request to enable contextual chunked representation. When set to true, Jina AI API will concatenate all sentences in the input field and feed them as a single string to the model. Internally, the model embeds this long concatenated string and then performs late chunking, returning a list of embeddings that matches the size of the input list.

python

# set up chromadb and jina API key
! pip install chromadb --quiet
! pip install httpx --quiet
! pip install pandas --quiet
import os
import chromadb
import getpass
import pandas as pd

from chromadb.utils.embedding_functions import JinaEmbeddingFunction
from chromadb.api.types import QueryResult

python

def convert_to_df(qr: QueryResult):
    df = pd.DataFrame(qr["ids"], columns=["id"])
    df["document"] = qr["documents"]
    df["distance"] = qr["distances"]
    return df

python

os.environ["CHROMA_JINA_API_KEY"] = getpass.getpass("Jina API Key:")

Setup

Let's set up two collections to compare the difference in retrieval with late chunking on and off using jina-embeddings-v3

python

client = chromadb.EphemeralClient()

# create collection with Jina embedding function with late chunking enabled
late_chunking_collection = client.create_collection(name="late_chunking", configuration={
    "embedding_function": JinaEmbeddingFunction(
        model_name="jina-embeddings-v3",
        # enable late chunking
        late_chunking=True,
        task="text-matching",
    )
})

# create collection with Jina embedding function with late chunking disabled
normal_collection = client.create_collection(name="normal", configuration={
    "embedding_function": JinaEmbeddingFunction(
        model_name="jina-embeddings-v3",
        task="text-matching",
    )
})

Documents & When to use Late Chunking

Late chunking works best with Chroma when a group of documents share similar context. In this case, all documents are about Berlin, with documents referring to "It" and "The city". Normally, the model will not have the context to understand these are referring to Berlin, but with late chunking that context is now imbued with the other documents' embeddings.

For best retrieval results, try to separate differing topics with separate adds when using late chunking. For example, if the first set of documents talks about Berlin, and the next refers to computer operating systems, it would be best to not contaminate one set of documents' context with the other.

python

# set up documents
documents = [
    'Berlin is the capital and largest city of Germany.', 
    'The city has a rich history dating back centuries.', 
    'It was founded in the 13th century and has been a significant cultural and political center throughout European history.', 
    'The metropolis experienced dramatic changes during the 20th century, including two world wars and a period of division.', 
    'After reunification, it underwent extensive reconstruction and modernization efforts.', 
    'Its population reached 3.85 million inhabitants in 2023, making it the most populous urban area in the country.', 
    'This represents a significant increase from previous decades, driven largely by immigration and economic opportunities.', 
    'The city is known for its vibrant cultural scene and historical significance.', 
    'Many tourists visit its famous landmarks each year, contributing significantly to the local economy.', 
    'The Brandenburg Gate stands as its most iconic symbol.'
]

ids = [str(i+1) for i in range(len(documents))]

python

# add documents to the normal collection
normal_collection.add(
    ids=ids,
    documents=documents,
)

# add documents to the late chunking collection
late_chunking_collection.add(
    ids=ids,
    documents=documents,
)

python

# let's query the normal collection and see the results
results = normal_collection.query(
    query_texts=["What is Berlin's population?", "When was Berlin founded?"],
    n_results=1,
)

print("Normal Collection Results:")
print(convert_to_df(results))


print("\n--------------------------------\n")

# let's query the late chunking collection and see the results
results = late_chunking_collection.query(
    query_texts=["What is Berlin's population?", "When was Berlin founded?"],
    n_results=1,
)

print("Late Chunking Collection Results:")
print(convert_to_df(results))