docs/examples/observability/UpTrainCallback.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/observability/UpTrainCallback.ipynb" target="_parent"></a>
UpTrain (github || website || docs) is an open-source platform to evaluate and improve GenAI applications. It provides grades for 20+ preconfigured checks (covering language, code, embedding use cases), performs root cause analysis on failure cases and gives insights on how to resolve them.
This notebook showcases how to use UpTrain Callback Handler to evaluate different components of your RAG pipelines.
The RAG query engine plays a crucial role in retrieving context and generating responses. To ensure its performance and response quality, we conduct the following evaluations:
The SubQuestionQueryGeneration operator decomposes a question into sub-questions, generating responses for each using an RAG query engine. To measure it's accuracy, we use:
Re-ranking involves reordering nodes based on relevance to the query and choosing the top nodes. Different evaluations are performed based on the number of nodes returned after re-ranking.
a. Same Number of Nodes
b. Different Number of Nodes:
These evaluations collectively ensure the robustness and effectiveness of the RAG query engine, SubQuestionQueryGeneration operator, and the re-ranking process in the LlamaIndex pipeline.
Install notebook dependencies.
%pip install llama-index-readers-web
%pip install llama-index-callbacks-uptrain
%pip install -q html2text llama-index pandas tqdm uptrain torch sentence-transformers
Import libraries.
from getpass import getpass
from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core.callbacks import CallbackManager
from llama_index.callbacks.uptrain.base import UpTrainCallbackHandler
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.postprocessor import SentenceTransformerRerank
import os
UpTrain provides you with:
You can choose between the following options for evaluating using UpTrain:
You can use the open-source evaluation service to evaluate your model. In this case, you will need to provide an OpenAI API key. You can get yours here.
In order to view your evaluations in the UpTrain dashboard, you will need to set it up by running the following commands in your terminal:
git clone https://github.com/uptrain-ai/uptrain
cd uptrain
bash run_uptrain.sh
This will start the UpTrain dashboard on your local machine. You can access it at http://localhost:3000/dashboard.
Parameters:
Alternatively, you can use UpTrain's managed service to evaluate your model. You can create a free UpTrain account here and get free trial credits. If you want more trial credits, book a call with the maintainers of UpTrain here.
The benefits of using the managed service are:
Once you perform the evaluations, you can view them in the UpTrain dashboard at https://dashboard.uptrain.ai/dashboard
Parameters:
Note: The project_name will be the project name under which the evaluations performed will be shown in the UpTrain dashboard.
os.environ["OPENAI_API_KEY"] = getpass()
callback_handler = UpTrainCallbackHandler(
key_type="openai",
api_key=os.environ["OPENAI_API_KEY"],
project_name="uptrain_llamaindex",
)
Settings.callback_manager = CallbackManager([callback_handler])
Load documents from Paul Graham's essay "What I Worked On".
documents = SimpleWebPageReader().load_data(
[
"https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt"
]
)
Parse the document into nodes.
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)
UpTrain callback handler will automatically capture the query, context and response once generated and will run the following three evaluations (Graded from 0 to 1) on the response:
index = VectorStoreIndex.from_documents(
documents,
)
query_engine = index.as_query_engine()
max_characters_per_line = 80
queries = [
"What did Paul Graham do growing up?",
"When and how did Paul Graham's mother die?",
"What, in Paul Graham's opinion, is the most distinctive thing about YC?",
"When and how did Paul Graham meet Jessica Livingston?",
"What is Bel, and when and where was it written?",
]
for query in queries:
response = query_engine.query(query)
The sub-question query engine is used to tackle the problem of answering a complex query using multiple data sources. It first breaks down the complex query into sub-questions for each relevant data source, then gathers all the intermediate responses and synthesizes a final response.
UpTrain callback handler will automatically capture the sub-question and the responses for each of them once generated and will run the following three evaluations (Graded from 0 to 1) on the response:
In addition to the above evaluations, the callback handler will also run the following evaluation:
# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
documents=documents,
use_async=True,
).as_query_engine()
query_engine_tools = [
QueryEngineTool(
query_engine=vector_query_engine,
metadata=ToolMetadata(
name="documents",
description="Paul Graham essay on What I Worked On",
),
),
]
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=query_engine_tools,
use_async=True,
)
response = query_engine.query(
"How was Paul Grahams life different before, during, and after YC?"
)
Re-ranking is the process of reordering the nodes based on their relevance to the query. There are multiple classes of re-ranking algorithms offered by Llamaindex. We have used LLMRerank for this example.
The re-ranker allows you to enter the number of top n nodes that will be returned after re-ranking. If this value remains the same as the original number of nodes, the re-ranker will only re-rank the nodes and not change the number of nodes. Otherwise, it will re-rank the nodes and return the top n nodes.
We will perform different evaluations based on the number of nodes returned after re-ranking.
If the number of nodes returned after re-ranking is the same as the original number of nodes, the following evaluation will be performed:
callback_handler = UpTrainCallbackHandler(
key_type="openai",
api_key=os.environ["OPENAI_API_KEY"],
project_name="uptrain_llamaindex",
)
Settings.callback_manager = CallbackManager([callback_handler])
rerank_postprocessor = SentenceTransformerRerank(
top_n=3, # number of nodes after reranking
keep_retrieval_score=True,
)
index = VectorStoreIndex.from_documents(
documents=documents,
)
query_engine = index.as_query_engine(
similarity_top_k=3, # number of nodes before reranking
node_postprocessors=[rerank_postprocessor],
)
response = query_engine.query(
"What did Sam Altman do in this essay?",
)
If the number of nodes returned after re-ranking is the lesser as the original number of nodes, the following evaluation will be performed:
callback_handler = UpTrainCallbackHandler(
key_type="openai",
api_key=os.environ["OPENAI_API_KEY"],
project_name="uptrain_llamaindex",
)
Settings.callback_manager = CallbackManager([callback_handler])
rerank_postprocessor = SentenceTransformerRerank(
top_n=2, # Number of nodes after re-ranking
keep_retrieval_score=True,
)
index = VectorStoreIndex.from_documents(
documents=documents,
)
query_engine = index.as_query_engine(
similarity_top_k=5, # Number of nodes before re-ranking
node_postprocessors=[rerank_postprocessor],
)
# Use your advanced RAG
response = query_engine.query(
"What did Sam Altman do in this essay?",
)
Here's a short video showcasing the dashboard and the insights: