Back to Pageindex

Document QA with PageIndex Chat API

cookbook/pageIndex_chat_quickstart.ipynb

latest3.5 KB
Original Source

<p align="center">Reasoning-based RAG&nbsp; ◦ &nbsp;No Vector DB&nbsp; ◦ &nbsp;No Chunking&nbsp; ◦ &nbsp;Human-like Retrieval</p> <p align="center"> <a href="https://vectify.ai">🏠 Homepage</a>&nbsp; • &nbsp; <a href="https://chat.pageindex.ai">🖥️ Platform</a>&nbsp; • &nbsp; <a href="https://docs.pageindex.ai/quickstart">📚 API Docs</a>&nbsp; • &nbsp; <a href="https://github.com/VectifyAI/PageIndex">📦 GitHub</a>&nbsp; • &nbsp; <a href="https://discord.com/invite/VuXuf29EUj">💬 Discord</a>&nbsp; • &nbsp; <a href="https://ii2abc2jejf.typeform.com/to/tK3AXl8T">✉️ Contact</a>&nbsp; </p> <div align="center">

  

</div>

Document QA with PageIndex Chat API

Similarity-based RAG based on Vector-DB has shown big limitations in recent AI applications, reasoning-based or agentic retrieval has become important in current developments.

PageIndex Chat is a AI assistant that allow you chat with multiple super-long documents without worrying about limited context or context rot problem. It is based on PageIndex, a vectorless reasoning-based RAG framework which gives more transparent and reliable results like a human expert.

<div align="center"> </div>

You can now access PageIndex Chat with API or SDK.

📝 Notebook Overview

This notebook demonstrates a simple, minimal example of doing document analysis with PageIndex Chat API on the recently released NVIDA 10Q report.

Install PageIndex SDK

python
%pip install -q --upgrade pageindex

Setup PageIndex

python
from pageindex import PageIndexClient

# Get your PageIndex API key from https://dash.pageindex.ai/api-keys
PAGEINDEX_API_KEY = "Your API KEY"
pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY)

Upload a document

python
import os, requests

pdf_url = "https://d18rn0p25nwr6d.cloudfront.net/CIK-0001045810/13e6981b-95ed-4aac-a602-ebc5865d0590.pdf"
pdf_path = os.path.join("../data", pdf_url.split('/')[-1])
os.makedirs(os.path.dirname(pdf_path), exist_ok=True)

response = requests.get(pdf_url)
with open(pdf_path, "wb") as f:
    f.write(response.content)
print(f"Downloaded {pdf_url}")

doc_id = pi_client.submit_document(pdf_path)["doc_id"]
print('Document Submitted:', doc_id)

Check the processing status

python
from pprint import pprint

doc_info = pi_client.get_document(doc_id)
pprint(doc_info)

if doc_info['status'] == 'completed':
  print(f"\n Document ready! ({doc_info['pageNum']} pages)")
elif doc_info['status'] == 'processing':
  print("\n Document is still processing. Please wait and check again.")

Ask a question about this document

python
query = "what is the revenue? Also show me which page I can find it."

for chunk in pi_client.chat_completions(
    messages=[{"role": "user", "content": query}],
    doc_id=doc_id,
    stream=True
):
    print(chunk, end='', flush=True)