examples/paper_metadata/README.md
This example extracts metadata (title, authors, abstract) from PDF papers, stores it in Postgres, and builds embeddings for semantic search.
We appreciate a star ⭐ at CocoIndex Github if this is helpful.
A running Postgres with the pgvector extension. If you don't have one, start a local instance with the compose file in this repo:
docker compose -f ../../dev/postgres.yaml up -d
Set OPENAI_API_KEY for metadata extraction
Set POSTGRES_URL for Postgres access
Install dependencies:
pip install -e .
Set environment variables:
export OPENAI_API_KEY="your_key"
export POSTGRES_URL="postgres://cocoindex:cocoindex@localhost/cocoindex"
This example uses the coco_examples_v1 schema by default to avoid clashing with the legacy example tables.
Build/update the index. Either of the following works:
cocoindex update main
or
python main.py
Query:
python main.py query "graph neural networks"
Note: this example does not create a vector index; queries will do a sequential scan.