docs/examples/citation/pdf_page_reference.ipynb
This guide shows you how to use LlamaIndex to get in-line page number citations in the response (and the response is streamed).
This is a simple combination of using the page number metadata in our PDF loader along with our indexing/query abstractions to use this information.
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/citation/pdf_page_reference.ipynb" target="_parent"></a>
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-llms-openai
!pip install llama-index
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
download_loader,
RAKEKeywordTableIndex,
)
from llama_index.llms.openai import OpenAI
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Download Data
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
Load document and build index
reader = SimpleDirectoryReader(input_files=["./data/10k/lyft_2021.pdf"])
data = reader.load_data()
index = VectorStoreIndex.from_documents(data)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)
Stream response with page citation
response = query_engine.query(
"What was the impact of COVID? Show statements in bullet form and show"
" page reference after each statement."
)
response.print_response_stream()
Inspect source nodes
for node in response.source_nodes:
print("-----")
text_fmt = node.node.get_content().strip().replace("\n", " ")[:1000]
print(f"Text:\t {text_fmt} ...")
print(f"Metadata:\t {node.node.metadata}")
print(f"Score:\t {node.score:.3f}")