fern/docs/pages/api-guide/ingestion.mdx
Documents are ingested through /v1/artifacts/ingest. Once ingested they are chunked, embedded, and stored in the vector store for retrieval.
curl -X POST http://localhost:8080/v1/artifacts/ingest \
-F "file=@/path/to/document.pdf"
Target a specific collection:
curl -X POST http://localhost:8080/v1/artifacts/ingest \
-F "file=@/path/to/document.pdf" \
-F "collection=my-collection"
curl "http://localhost:8080/v1/artifacts/list?collection=my-collection"
curl -X POST http://localhost:8080/v1/artifacts/delete \
-H "Content-Type: application/json" \
-d '{"collection": "my-collection", "artifact": "<artifact-id>"}'
make wipe
To ingest an entire folder from the command line, enable local ingestion in your settings:
data:
local_ingestion:
enabled: true
allow_ingest_from: ["*"]
Then run:
make ingest /path/to/folder
Watch mode (re-ingest on file changes):
make ingest /path/to/folder -- --watch
PrivateGPT handles plain text natively. The following formats are also supported with built-in parsers:
.pdf · .docx · .pptx · .ppt · .pptm · .hwp · .epub · .md · .csv · .json · .ipynb · .mbox · .jpg · .jpeg · .png · .mp3 · .mp4
Any other file type is read as plain text.