Back to Cocoindex

PDF Embedding (v1)

examples/pdf_embedding/README.md

1.0.31000 B
Original Source

PDF Embedding (v1)

This example builds an embedding index from local PDF files. It converts PDFs to markdown, chunks the text, embeds each chunk, and stores the results in Postgres (pgvector). It also provides a simple query demo.

We appreciate a star ⭐ at CocoIndex Github if this is helpful.

Prerequisite

Install Postgres if you don't have one.

Run

Install dependencies:

sh
pip install -e .

Set a database URL (or use .env):

sh
export POSTGRES_URL="postgres://cocoindex:cocoindex@localhost/cocoindex"

Build/update the index:

sh
cocoindex update main.py

Query:

sh
python main.py query "what is attention?"

Note: this example does not create a vector index; queries will do a sequential scan.