plugins/_document_query/README.md
Load, parse, index, and Q&A over local and remote documents with configurable timeouts and thread-safe parsers.
See default_config.yaml for all options. Key settings:
| Setting | Default | Description |
|---|---|---|
| fetch_timeout | 30 | HTTP fetch timeout (seconds) |
| fetch_retries | 3 | HTTP retry attempts |
| max_remote_bytes | 52428800 | Max remote document size |
| per_document_timeout | 60 | Max time for a single document parse |
| gather_timeout | 120 | Max time for all documents combined |
| parser_concurrency | 1 | Max parser jobs running across all chats in one process |
| context_intro_chunks | 2 | Leading chunks included per document for title/abstract grounding |
| chunk_size | 1000 | Text splitter chunk size |
| chunk_overlap | 100 | Text splitter overlap |
| max_index_chunks | 1200 | Maximum indexed chunks before adaptive chunk sizing, or 0 for no cap |
| search_threshold | 0.5 | Similarity search threshold |
| liteparse_enabled | true | Prefer LiteParse before legacy parser fallbacks |
| liteparse_num_workers | 2 | Max LiteParse OCR workers per parser job |
| liteparse_ocr_auto_disable_pages | 30 | Disable OCR for PDFs at or above this effective page count |
| thread_offload | true | Offload sync parsers to thread pool |
LiteParse is installed into the Agent Zero framework runtime from hooks.py during plugin install/startup. If installation fails, the plugin logs the error and continues with the legacy parser fallbacks.
LiteParse always runs in a child process so native parser and OCR failures stay isolated from the Web UI process.
| Parser | MIME Types | Backend |
|---|---|---|
| LiteParseParser | PDF, Office/OpenDocument, images | LiteParse |
| PdfParser | application/pdf | PyMuPDF + Tesseract OCR fallback |
| HtmlParser | text/html | Markdownify transformer |
| TextParser | text/*, application/json, YAML, XML, TOML, JS, TS, shell | Direct read |
| ImageParser | image/* | UnstructuredLoader |
| UnstructuredParser | * (catch-all) | UnstructuredLoader hi-res |