docs/tools/prepare-pdf-for-ai.md
Extracts content from PDF files and structures it as JSON optimized for ingestion by large language models (LLMs) and AI frameworks like LlamaIndex. Each page's content is extracted and organized into a structured format ready for RAG pipelines, chatbots, or semantic search systems.
filename_llm.json. Multiple files produce a pdf-for-ai.zip archive.The tool uses PyMuPDF's LlamaIndex integration to extract page-level content with metadata, producing output that can be directly loaded into AI frameworks.
This tool has no configurable options. All pages are extracted with full text and metadata.
filename_llm.jsonpdf-for-ai.zip containing one _llm.json per input PDF.The JSON output follows the LlamaIndex document schema with per-page text content and metadata fields.