docs/examples/managed/manage_retrieval_benchmark.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/managed/manage_retrieval_benchmark.ipynb" target="_parent"></a>
In this notebook, we will compare different Retrieval Strategies including
%pip install llama-index-llms-openai
%pip install llama-index-indices-managed-colbert
%pip install llama-index-vector-stores-qdrant
%pip install llama-index-llms-gemini
%pip install llama-index-embeddings-gemini
%pip install llama-index-indices-managed-vectara
%pip install llama-index-vector-stores-google
%pip install llama-index-indices-managed-google
%pip install llama-index-response-synthesizers-google
%pip install llama-index
%pip install "google-ai-generativelanguage>=0.4,<=1.0"
%pip install torch sentence-transformers
The Google Semantic Retriever API lets you perform semantic search on your own data. Since it's your data, this needs stricter access controls than API Keys. Authenticate with OAuth through service accounts or through your user credentials. This quickstart uses a simplified authentication approach for a testing environment, and service account setup are typically easier to start. For a production environment, learn about authentication and authorization before choosing the access credentials that are appropriate for your app.
Demo recording for authenticating using service accounts: Demo
Note: At this time, the Google Generative AI Semantic Retriever API is only available in certain regions.
Google Auth service accounts let an application authenticate to make authorized Google API calls. To OAuth using service accounts, follow the steps below:
Enable the Generative Language API: Documentation
Create the Service Account by following the documentation.
%pip install google-auth-oauthlib
from google.oauth2 import service_account
from llama_index.indices.managed.google import GoogleIndex
from llama_index.vector_stores.google import set_google_config
credentials = service_account.Credentials.from_service_account_file(
"service_account_key.json",
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/generative-language.retriever",
],
)
set_google_config(auth_credentials=credentials)
Please follow OAuth Quickstart to setup OAuth using user credentials. Below are overview of steps from the documentation that are required.
Enable the Generative Language API: Documentation
Configure the OAuth consent screen: Documentation
Authorize credentials for a desktop application: Documentation
If you want to run this notebook in Colab start by uploading your
client_secret*.json file using the "File > Upload" option.
Rename the uploaded file to client_secret.json or change the variable client_file_name in the code below.
Note: At this time, the Google Generative AI Semantic Retriever API is only available in certain regions.
# Replace TODO-your-project-name with the project used in the OAuth Quickstart
project_name = "TODO-your-project-name" # @param {type:"string"}
# Replace [email protected] with the email added as a test user in the OAuth Quickstart
email = "[email protected]" # @param {type:"string"}
# Replace client_secret.json with the client_secret_* file name you uploaded.
client_file_name = "client_secret.json"
# IMPORTANT: Follow the instructions from the output - you must copy the command
# to your terminal and copy the output after authentication back here.
!gcloud config set project $project_name
!gcloud config set account $email
# NOTE: The simplified project setup in this tutorial triggers a "Google hasn't verified this app." dialog.
# This is normal, click "Advanced" -> "Go to [app name] (unsafe)"
!gcloud auth application-default login --no-browser --client-id-file=$client_file_name --scopes="https://www.googleapis.com/auth/generative-language.retriever,https://www.googleapis.com/auth/cloud-platform"
This will provide you with a URL, which you should enter into your local browser. Follow the instruction to complete the authentication and authorization.
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
"which program did this author attend?"Wiki Link: https://en.wikipedia.org/wiki/Paul_Graham_(programmer)
Answer from Wiki:
Graham and his family moved to Pittsburgh, Pennsylvania in 1968, where he later attended Gateway High School. Graham gained interest in science and mathematics from his father who was a nuclear physicist.[8]
Graham received a Bachelor of Arts with a major in philosophy from Cornell University in 1986.[9][10][11] He then received a Master of Science in 1988 and a Doctor of Philosophy in 1990, both in computer science from Harvard University.[9][12]
Graham has also studied painting at the Rhode Island School of Design and at the Accademia di Belle Arti in Florence.[9][12]
import os
GOOGLE_API_KEY = "" # add your GOOGLE API key here
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
from llama_index.core import SimpleDirectoryReader
from llama_index.indices.managed.google import GoogleIndex
# Create a Google corpus.
google_index = GoogleIndex.create_corpus(display_name="My first corpus!")
print(f"Newly created corpus ID is {google_index.corpus_id}.")
# Ingestion.
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
google_index.insert_documents(documents)
# load Google index corpus from corpus_id
# Don't need to load it again if you have already done the ingestion step
google_index = GoogleIndex.from_corpus(corpus_id="")
query_engine = google_index.as_query_engine()
response = query_engine.query("which program did this author attend?")
print(response)
from llama_index.core.response.notebook_utils import display_source_node
for r in response.source_nodes:
display_source_node(r, source_length=1000)
Verbose Answer Stylefrom google.ai.generativelanguage import (
GenerateAnswerRequest,
)
query_engine = google_index.as_query_engine(
# Extra parameters specific to the Google query engine.
temperature=0.3,
answer_style=GenerateAnswerRequest.AnswerStyle.VERBOSE,
)
response = query_engine.query("Which program did this author attend?")
print(response)
from llama_index.core.response.notebook_utils import display_source_node
for r in response.source_nodes:
display_source_node(r, source_length=1000)
Abstractive Answer Stylefrom google.ai.generativelanguage import (
GenerateAnswerRequest,
)
query_engine = google_index.as_query_engine(
# Extra parameters specific to the Google query engine.
temperature=0.3,
answer_style=GenerateAnswerRequest.AnswerStyle.ABSTRACTIVE,
)
response = query_engine.query("Which program did this author attend?")
print(response)
from llama_index.core.response.notebook_utils import display_source_node
for r in response.source_nodes:
display_source_node(r, source_length=1000)
Extractive Answer Stylefrom google.ai.generativelanguage import (
GenerateAnswerRequest,
)
query_engine = google_index.as_query_engine(
# Extra parameters specific to the Google query engine.
temperature=0.3,
answer_style=GenerateAnswerRequest.AnswerStyle.EXTRACTIVE,
)
response = query_engine.query("Which program did this author attend?")
print(response)
from llama_index.core.response.notebook_utils import display_source_node
for r in response.source_nodes:
display_source_node(r, source_length=1000)
Gemini as Reranker LLMSentence BERT cross encoder for RerankingAbstractive Answer Style for ResponseFor the 1st example of reranking, we tried using Gemini as LLM for reranking the retrieved nodes.
from llama_index.response_synthesizers.google import GoogleTextSynthesizer
from llama_index.vector_stores.google import GoogleVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.llms.gemini import Gemini
from llama_index.core.postprocessor import LLMRerank
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.embeddings.gemini import GeminiEmbedding
# Set up the query engine with a LLM as reranker.
response_synthesizer = GoogleTextSynthesizer.from_defaults(
temperature=0.7, answer_style=GenerateAnswerRequest.AnswerStyle.ABSTRACTIVE
)
reranker = LLMRerank(
top_n=5,
llm=Gemini(api_key=GOOGLE_API_KEY),
)
retriever = google_index.as_retriever(similarity_top_k=5)
query_engine = RetrieverQueryEngine.from_args(
retriever=retriever,
response_synthesizer=response_synthesizer,
node_postprocessors=[reranker],
)
# Query for better result!
response = query_engine.query("Which program did this author attend?")
print(response.response)
SentenceTransformer for cross-encoder reranking the retrieved nodesfrom llama_index.core.postprocessor import SentenceTransformerRerank
sbert_rerank = SentenceTransformerRerank(
model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=5
)
from llama_index.response_synthesizers.google import GoogleTextSynthesizer
from llama_index.vector_stores.google import GoogleVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.llms.gemini import Gemini
from llama_index.core.postprocessor import LLMRerank
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.embeddings.gemini import GeminiEmbedding
# Set up the query engine with a LLM as reranker.
response_synthesizer = GoogleTextSynthesizer.from_defaults(
temperature=0.1, answer_style=GenerateAnswerRequest.AnswerStyle.ABSTRACTIVE
)
retriever = google_index.as_retriever(similarity_top_k=5)
query_engine = RetrieverQueryEngine.from_args(
retriever=retriever,
response_synthesizer=response_synthesizer,
node_postprocessors=[sbert_rerank],
)
# Query for better result!
response = query_engine.query("Which program did this author attend?")
print(response.response)
Observation for Google Semantic RetrievalGoogle Semantic Retrieval supports different AnswerStyle. Different style could yield different retrieval and final synthesis results.Gemini as LLM or SBERT as cross-encoder reranker, the results are more comprehensive and accurate.import os
OPENAI_API_KEY = "sk-"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import Settings
import qdrant_client
Settings.chunk_size = 256
# documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# Create a local Qdrant vector store
client = qdrant_client.QdrantClient(path="qdrant_retrieval_2")
vector_store = QdrantVectorStore(client=client, collection_name="collection")
qdrant_index = VectorStoreIndex.from_documents(documents)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
query_engine = qdrant_index.as_query_engine()
response = query_engine.query("Which program did this author attend?")
print(response)
for r in response.source_nodes:
display_source_node(r, source_length=1000)
programquery_engine = qdrant_index.as_query_engine()
response = query_engine.query(
"Which universities or schools or programs did this author attend?"
)
print(response)
from llama_index.core import get_response_synthesizer
reranker = LLMRerank(top_n=3)
retriever = qdrant_index.as_retriever(similarity_top_k=3)
query_engine = RetrieverQueryEngine.from_args(
retriever=retriever,
response_synthesizer=get_response_synthesizer(
response_mode="tree_summarize",
),
node_postprocessors=[reranker],
)
response = query_engine.query(
"Which universities or schools or programs did this author attend?"
)
print(response.response)
from llama_index.core import get_response_synthesizer
sbert_rerank = SentenceTransformerRerank(
model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=5
)
retriever = qdrant_index.as_retriever(similarity_top_k=5)
query_engine = RetrieverQueryEngine.from_args(
retriever=retriever,
response_synthesizer=get_response_synthesizer(
response_mode="tree_summarize",
),
node_postprocessors=[sbert_rerank],
)
response = query_engine.query(
"Which universities or schools or programs did this author attend?"
)
print(response.response)
Observation for LlamaIndex default retrievalQuery Rewrite, the results getting better.Reranking with top-5 retrieved results, the results get 100% accurate.from llama_index.core import SimpleDirectoryReader
from llama_index.indices.managed.vectara import VectaraIndex
vectara_customer_id = ""
vectara_corpus_id = ""
vectara_api_key = ""
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
vectara_index = VectaraIndex.from_documents(
documents,
vectara_customer_id=vectara_customer_id,
vectara_corpus_id=vectara_corpus_id,
vectara_api_key=vectara_api_key,
)
vectara_query_engine = vectara_index.as_query_engine(similarity_top_k=5)
response = vectara_query_engine.query("Which program did this author attend?")
print(response)
for r in response.source_nodes:
display_source_node(r, source_length=1000)
Observation for VectaraAccademia di Belle Arti in Florence.!git -C ColBERT/ pull || git clone https://github.com/stanford-futuredata/ColBERT.git
import sys
sys.path.insert(0, "ColBERT/")
!pip install faiss-cpu torch
from llama_index.core import SimpleDirectoryReader
from llama_index.indices.managed.colbert import ColbertIndex
from llama_index.llms.openai import OpenAI
import os
OPENAI_API_KEY = "sk-"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
from llama_index.core import Settings
Settings.llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = ColbertIndex.from_documents(
documents=documents,
)
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("Which program did this author attend?")
print(response.response)
for node in response.source_nodes:
print(node)
response = query_engine.query(
"Which universities or schools or programs did this author attend?"
)
print(response.response)
for node in response.source_nodes:
print(node)