Semantic Retriever Benchmark

In this notebook, we will compare different Retrieval Strategies including

Google Semantic Retrieval
LlamaIndex Retrieval
Vectara Managed Retrieval
ColBERT-V2 end-to-end Retrieval

Installation

python

%pip install llama-index-llms-openai
%pip install llama-index-indices-managed-colbert
%pip install llama-index-vector-stores-qdrant
%pip install llama-index-llms-gemini
%pip install llama-index-embeddings-gemini
%pip install llama-index-indices-managed-vectara
%pip install llama-index-vector-stores-google
%pip install llama-index-indices-managed-google
%pip install llama-index-response-synthesizers-google

python

%pip install llama-index
%pip install "google-ai-generativelanguage>=0.4,<=1.0"
%pip install torch sentence-transformers

Google Authentication Overview

The Google Semantic Retriever API lets you perform semantic search on your own data. Since it's your data, this needs stricter access controls than API Keys. Authenticate with OAuth through service accounts or through your user credentials. This quickstart uses a simplified authentication approach for a testing environment, and service account setup are typically easier to start. For a production environment, learn about authentication and authorization before choosing the access credentials that are appropriate for your app.

Demo recording for authenticating using service accounts: Demo

Note: At this time, the Google Generative AI Semantic Retriever API is only available in certain regions.

Authentication (Option 1): OAuth using service accounts

Google Auth service accounts let an application authenticate to make authorized Google API calls. To OAuth using service accounts, follow the steps below:

Enable the Generative Language API: Documentation
Create the Service Account by following the documentation.

After creating the service account, generate a service account key.

Upload your service account file by using the file icon on the left sidebar, then the upload icon, as shown in the screenshot below.

python

%pip install google-auth-oauthlib

python

from google.oauth2 import service_account
from llama_index.indices.managed.google import GoogleIndex
from llama_index.vector_stores.google import set_google_config

credentials = service_account.Credentials.from_service_account_file(
    "service_account_key.json",
    scopes=[
        "https://www.googleapis.com/auth/cloud-platform",
        "https://www.googleapis.com/auth/generative-language.retriever",
    ],
)

set_google_config(auth_credentials=credentials)

Authentication (Option 2): OAuth using user credentials

Please follow OAuth Quickstart to setup OAuth using user credentials. Below are overview of steps from the documentation that are required.

Enable the Generative Language API: Documentation
Configure the OAuth consent screen: Documentation
Authorize credentials for a desktop application: Documentation

If you want to run this notebook in Colab start by uploading your client_secret*.json file using the "File > Upload" option.
Rename the uploaded file to client_secret.json or change the variable client_file_name in the code below.

Note: At this time, the Google Generative AI Semantic Retriever API is only available in certain regions.

python

# Replace TODO-your-project-name with the project used in the OAuth Quickstart
project_name = "TODO-your-project-name"  #  @param {type:"string"}
# Replace [email protected] with the email added as a test user in the OAuth Quickstart
email = "[email protected]"  #  @param {type:"string"}
# Replace client_secret.json with the client_secret_* file name you uploaded.
client_file_name = "client_secret.json"

# IMPORTANT: Follow the instructions from the output - you must copy the command
# to your terminal and copy the output after authentication back here.
!gcloud config set project $project_name
!gcloud config set account $email

# NOTE: The simplified project setup in this tutorial triggers a "Google hasn't verified this app." dialog.
# This is normal, click "Advanced" -> "Go to [app name] (unsafe)"
!gcloud auth application-default login --no-browser --client-id-file=$client_file_name --scopes="https://www.googleapis.com/auth/generative-language.retriever,https://www.googleapis.com/auth/cloud-platform"

This will provide you with a URL, which you should enter into your local browser. Follow the instruction to complete the authentication and authorization.

Download Paul Graham Data

python

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Ground truth for the query `"which program did this author attend?"`

Wiki Link: https://en.wikipedia.org/wiki/Paul_Graham_(programmer)

Answer from Wiki:

Graham and his family moved to Pittsburgh, Pennsylvania in 1968, where he later attended Gateway High School. Graham gained interest in science and mathematics from his father who was a nuclear physicist.[8]

Graham received a Bachelor of Arts with a major in philosophy from Cornell University in 1986.[9][10][11] He then received a Master of Science in 1988 and a Doctor of Philosophy in 1990, both in computer science from Harvard University.[9][12]

Graham has also studied painting at the Rhode Island School of Design and at the Accademia di Belle Arti in Florence.[9][12]

Google Semantic Retrieval

python

import os

GOOGLE_API_KEY = ""  # add your GOOGLE API key here
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

python

from llama_index.core import SimpleDirectoryReader
from llama_index.indices.managed.google import GoogleIndex

# Create a Google corpus.
google_index = GoogleIndex.create_corpus(display_name="My first corpus!")
print(f"Newly created corpus ID is {google_index.corpus_id}.")

# Ingestion.
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
google_index.insert_documents(documents)

python

# load Google index corpus from corpus_id
# Don't need to load it again if you have already done the ingestion step
google_index = GoogleIndex.from_corpus(corpus_id="")

Google Semantic Retrieval: Using the default query engine

python

query_engine = google_index.as_query_engine()
response = query_engine.query("which program did this author attend?")
print(response)

Show the nodes from the response

python

from llama_index.core.response.notebook_utils import display_source_node

for r in response.source_nodes:
    display_source_node(r, source_length=1000)

Google Semantic Retrieval: Using `Verbose` Answer Style

python

from google.ai.generativelanguage import (
    GenerateAnswerRequest,
)

query_engine = google_index.as_query_engine(
    # Extra parameters specific to the Google query engine.
    temperature=0.3,
    answer_style=GenerateAnswerRequest.AnswerStyle.VERBOSE,
)

response = query_engine.query("Which program did this author attend?")
print(response)

python

from llama_index.core.response.notebook_utils import display_source_node

for r in response.source_nodes:
    display_source_node(r, source_length=1000)

Google Semantic Retrieval: Using `Abstractive` Answer Style

python

from google.ai.generativelanguage import (
    GenerateAnswerRequest,
)

query_engine = google_index.as_query_engine(
    # Extra parameters specific to the Google query engine.
    temperature=0.3,
    answer_style=GenerateAnswerRequest.AnswerStyle.ABSTRACTIVE,
)

response = query_engine.query("Which program did this author attend?")
print(response)

python

from llama_index.core.response.notebook_utils import display_source_node

for r in response.source_nodes:
    display_source_node(r, source_length=1000)

Google Semantic Retrieval: Using `Extractive` Answer Style

python

from google.ai.generativelanguage import (
    GenerateAnswerRequest,
)

query_engine = google_index.as_query_engine(
    # Extra parameters specific to the Google query engine.
    temperature=0.3,
    answer_style=GenerateAnswerRequest.AnswerStyle.EXTRACTIVE,
)

response = query_engine.query("Which program did this author attend?")
print(response)

python

from llama_index.core.response.notebook_utils import display_source_node

for r in response.source_nodes:
    display_source_node(r, source_length=1000)

Google Semantic Retrieval: Advanced Retrieval with LlamaIndex Reranking and Synthesizer

Gemini as Reranker LLM
Or using Sentence BERT cross encoder for Reranking
Adopt Abstractive Answer Style for Response

For the 1st example of reranking, we tried using Gemini as LLM for reranking the retrieved nodes.

python

from llama_index.response_synthesizers.google import GoogleTextSynthesizer
from llama_index.vector_stores.google import GoogleVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.llms.gemini import Gemini
from llama_index.core.postprocessor import LLMRerank
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.embeddings.gemini import GeminiEmbedding


# Set up the query engine with a LLM as reranker.
response_synthesizer = GoogleTextSynthesizer.from_defaults(
    temperature=0.7, answer_style=GenerateAnswerRequest.AnswerStyle.ABSTRACTIVE
)

reranker = LLMRerank(
    top_n=5,
    llm=Gemini(api_key=GOOGLE_API_KEY),
)
retriever = google_index.as_retriever(similarity_top_k=5)
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[reranker],
)

# Query for better result!
response = query_engine.query("Which program did this author attend?")

python

print(response.response)

For the 2nd example of reranking, we use `SentenceTransformer` for cross-encoder reranking the retrieved nodes

python

from llama_index.core.postprocessor import SentenceTransformerRerank

sbert_rerank = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=5
)

python

from llama_index.response_synthesizers.google import GoogleTextSynthesizer
from llama_index.vector_stores.google import GoogleVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.llms.gemini import Gemini
from llama_index.core.postprocessor import LLMRerank
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.embeddings.gemini import GeminiEmbedding


# Set up the query engine with a LLM as reranker.
response_synthesizer = GoogleTextSynthesizer.from_defaults(
    temperature=0.1, answer_style=GenerateAnswerRequest.AnswerStyle.ABSTRACTIVE
)

retriever = google_index.as_retriever(similarity_top_k=5)
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[sbert_rerank],
)

# Query for better result!
response = query_engine.query("Which program did this author attend?")

python

print(response.response)

`Observation` for `Google Semantic Retrieval`

Google Semantic Retrieval supports different AnswerStyle. Different style could yield different retrieval and final synthesis results.
The results are mostly partly correct without reranker.
After applying either Gemini as LLM or SBERT as cross-encoder reranker, the results are more comprehensive and accurate.

LlamaIndex Default Baseline with OpenAI embedding and GPT as LLM for Synthesizer

python

import os

OPENAI_API_KEY = "sk-"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

python

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import Settings
import qdrant_client

Settings.chunk_size = 256

# documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

# Create a local Qdrant vector store
client = qdrant_client.QdrantClient(path="qdrant_retrieval_2")

vector_store = QdrantVectorStore(client=client, collection_name="collection")
qdrant_index = VectorStoreIndex.from_documents(documents)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

python

query_engine = qdrant_index.as_query_engine()
response = query_engine.query("Which program did this author attend?")
print(response)

python

for r in response.source_nodes:
    display_source_node(r, source_length=1000)

Rewrite the Query to include more entities related to `program`

python

query_engine = qdrant_index.as_query_engine()
response = query_engine.query(
    "Which universities or schools or programs did this author attend?"
)
print(response)

LlamaIndex Default Configuration with LLM Reranker and Tree Summarize for Response

python

from llama_index.core import get_response_synthesizer


reranker = LLMRerank(top_n=3)
retriever = qdrant_index.as_retriever(similarity_top_k=3)
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    response_synthesizer=get_response_synthesizer(
        response_mode="tree_summarize",
    ),
    node_postprocessors=[reranker],
)

response = query_engine.query(
    "Which universities or schools or programs did this author attend?"
)

python

print(response.response)

python

from llama_index.core import get_response_synthesizer


sbert_rerank = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=5
)
retriever = qdrant_index.as_retriever(similarity_top_k=5)
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    response_synthesizer=get_response_synthesizer(
        response_mode="tree_summarize",
    ),
    node_postprocessors=[sbert_rerank],
)

response = query_engine.query(
    "Which universities or schools or programs did this author attend?"
)

python

print(response.response)

`Observation` for LlamaIndex default retrieval

the default query engine from LlamaIndex could only yield partly correct answer
With Query Rewrite, the results getting better.
With Reranking with top-5 retrieved results, the results get 100% accurate.

Vectara Managed Index and Retrieval

python

from llama_index.core import SimpleDirectoryReader
from llama_index.indices.managed.vectara import VectaraIndex

python

vectara_customer_id = ""
vectara_corpus_id = ""
vectara_api_key = ""

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
vectara_index = VectaraIndex.from_documents(
    documents,
    vectara_customer_id=vectara_customer_id,
    vectara_corpus_id=vectara_corpus_id,
    vectara_api_key=vectara_api_key,
)

python

vectara_query_engine = vectara_index.as_query_engine(similarity_top_k=5)
response = vectara_query_engine.query("Which program did this author attend?")

print(response)

python

for r in response.source_nodes:
    display_source_node(r, source_length=1000)

`Observation` for Vectara

Vectara could provide somehow accurate results with citations, but it misses Accademia di Belle Arti in Florence.

ColBERT-V2 Managed Index and Retrieval

python

!git -C ColBERT/ pull || git clone https://github.com/stanford-futuredata/ColBERT.git
import sys

sys.path.insert(0, "ColBERT/")

python

!pip install faiss-cpu torch

python

from llama_index.core import SimpleDirectoryReader
from llama_index.indices.managed.colbert import ColbertIndex
from llama_index.llms.openai import OpenAI

python

import os

OPENAI_API_KEY = "sk-"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

Build ColBERT-V2 end-to-end Index

python

from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0, model="gpt-3.5-turbo")

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = ColbertIndex.from_documents(
    documents=documents,
)

Query the ColBERT-V2 index with question

python

query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("Which program did this author attend?")
print(response.response)

python

for node in response.source_nodes:
    print(node)

python

response = query_engine.query(
    "Which universities or schools or programs did this author attend?"
)
print(response.response)

python

for node in response.source_nodes:
    print(node)

Semantic Retriever Benchmark

Semantic Retriever Benchmark

Installation

Google Authentication Overview

Authentication (Option 1): OAuth using service accounts

Authentication (Option 2): OAuth using user credentials

Download Paul Graham Data

Ground truth for the query "which program did this author attend?"

Google Semantic Retrieval

Google Semantic Retrieval: Using the default query engine

Show the nodes from the response

Google Semantic Retrieval: Using Verbose Answer Style

Google Semantic Retrieval: Using Abstractive Answer Style

Google Semantic Retrieval: Using Extractive Answer Style

Google Semantic Retrieval: Advanced Retrieval with LlamaIndex Reranking and Synthesizer

For the 2nd example of reranking, we use SentenceTransformer for cross-encoder reranking the retrieved nodes

Observation for Google Semantic Retrieval

LlamaIndex Default Baseline with OpenAI embedding and GPT as LLM for Synthesizer

Rewrite the Query to include more entities related to program

LlamaIndex Default Configuration with LLM Reranker and Tree Summarize for Response

Observation for LlamaIndex default retrieval

Vectara Managed Index and Retrieval

Observation for Vectara

ColBERT-V2 Managed Index and Retrieval

Build ColBERT-V2 end-to-end Index

Query the ColBERT-V2 index with question

Ground truth for the query `"which program did this author attend?"`

Google Semantic Retrieval: Using `Verbose` Answer Style

Google Semantic Retrieval: Using `Abstractive` Answer Style

Google Semantic Retrieval: Using `Extractive` Answer Style

For the 2nd example of reranking, we use `SentenceTransformer` for cross-encoder reranking the retrieved nodes

`Observation` for `Google Semantic Retrieval`

Rewrite the Query to include more entities related to `program`

`Observation` for LlamaIndex default retrieval

`Observation` for Vectara