skills/liteparse/references/choosing_a_parser.md
Use this guide to pick the right tool in the scientific-agent-skills repo (or LlamaParse for cloud escalation).
flowchart TD
start[User has a document task]
start --> q1{Need PDF merge split forms or encryption utilities?}
q1 -->|yes| pdfSkill[pdf skill]
q1 -->|no| q2{Need Markdown audio video EPUB or Azure table extraction?}
q2 -->|yes| markitdown[markitdown skill]
q2 -->|no| q3{Need bounding boxes fast local parse or page PNGs for agents?}
q3 -->|yes| liteparse[liteparse skill]
q3 -->|no| q4{Complex tables handwriting or production cloud pipeline?}
q4 -->|yes| llamaparse[LlamaParse cloud]
q4 -->|no| liteparse
| Criterion | LiteParse | MarkItDown | pdf skill | LlamaParse |
|---|---|---|---|---|
| Primary output | Layout text + JSON with bboxes | Markdown | PDF bytes / extracted text | Structured markdown / JSON (cloud) |
| Runs locally | Yes | Yes | Yes | No (cloud API) |
| Bounding boxes | Yes | No | Limited | Yes (cloud) |
| OCR | Tesseract + optional HTTP OCR | Yes (images/PDF) | Via external tools | Advanced |
| Page screenshots | Yes (PNG) | No | Image extract only | Varies |
| Office → text | Via LibreOffice convert | Native converters | N/A | Yes |
| Audio / video / EPUB | No | Yes | No | Some formats |
| PDF merge / split / forms | No | No | Yes | No |
| Best for | RAG grounding, agent vision, batch PDF corpus | LLM-friendly Markdown pipelines | PDF manipulation | Hard documents at scale |
Link: https://docs.cloud.llamaindex.ai/llamaparse/overview
Common pipelines:
Avoid running LiteParse and MarkItDown on the same file unless you have distinct consumers (coordinates vs Markdown).