cookbook/pageIndex_chat_quickstart.ipynb
Similarity-based RAG based on Vector-DB has shown big limitations in recent AI applications, reasoning-based or agentic retrieval has become important in current developments.
PageIndex Chat is a AI assistant that allow you chat with multiple super-long documents without worrying about limited context or context rot problem. It is based on PageIndex, a vectorless reasoning-based RAG framework which gives more transparent and reliable results like a human expert.
<div align="center"> </div>You can now access PageIndex Chat with API or SDK.
This notebook demonstrates a simple, minimal example of doing document analysis with PageIndex Chat API on the recently released NVIDA 10Q report.
%pip install -q --upgrade pageindex
from pageindex import PageIndexClient
# Get your PageIndex API key from https://dash.pageindex.ai/api-keys
PAGEINDEX_API_KEY = "Your API KEY"
pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY)
import os, requests
pdf_url = "https://d18rn0p25nwr6d.cloudfront.net/CIK-0001045810/13e6981b-95ed-4aac-a602-ebc5865d0590.pdf"
pdf_path = os.path.join("../data", pdf_url.split('/')[-1])
os.makedirs(os.path.dirname(pdf_path), exist_ok=True)
response = requests.get(pdf_url)
with open(pdf_path, "wb") as f:
f.write(response.content)
print(f"Downloaded {pdf_url}")
doc_id = pi_client.submit_document(pdf_path)["doc_id"]
print('Document Submitted:', doc_id)
from pprint import pprint
doc_info = pi_client.get_document(doc_id)
pprint(doc_info)
if doc_info['status'] == 'completed':
print(f"\n Document ready! ({doc_info['pageNum']} pages)")
elif doc_info['status'] == 'processing':
print("\n Document is still processing. Please wait and check again.")
query = "what is the revenue? Also show me which page I can find it."
for chunk in pi_client.chat_completions(
messages=[{"role": "user", "content": query}],
doc_id=doc_id,
stream=True
):
print(chunk, end='', flush=True)