packages/docling-slim/README.md
Lightweight SDK for parsing documents with minimal dependencies and opt-in extras
Docling Slim is a minimal-dependency version of Docling that allows you to install only the components you need. It provides the core document processing functionality with ~50MB of base dependencies, and you can add specific features through optional extras.
docling (recommended): If you want the full-featured experience with all standard capabilitiesdocling-slim: If you need fine-grained control over dependencies or want to minimize installation sizeWe recommend most users install the full-featured docling package instead:
pip install docling
The docling package includes all standard features, the CLI tools, and is the easiest way to get started. Visit the main Docling documentation for complete guides and examples.
# PDF support with local models
pip install docling-slim[format-pdf,models-local]
# Office formats only
pip install docling-slim[format-office]
# PDF + CLI
pip install docling-slim[format-pdf,cli]
# Docling service client for using the Docling Serve API
pip install docling-slim[service-client]
| Extra | Description | Use Case |
|---|---|---|
standard | All standard features (same as docling package) | Full-featured usage |
all | All available extras | Complete installation |
| Extra | Description | Use Case |
|---|---|---|
cli | Command-line interface (typer, rich) | CLI tools (docling, docling-tools) |
| Extra | Description | Use Case |
|---|---|---|
convert-core | Core conversion components (numpy, pillow, scipy) | Basic document conversion |
extract-core | Structured information extraction | Data extraction from documents |
| Extra | Description | Use Case |
|---|---|---|
format-pdf | PDF parsing (pypdfium2 + docling-parse) | PDF documents |
format-pdf-pypdfium2 | PDF rendering only | Lightweight PDF support |
format-pdf-docling | Advanced PDF parsing | Complex PDF layouts |
| Extra | Description | Use Case |
|---|---|---|
format-office | All Office formats | Microsoft Office documents |
format-docx | Microsoft Word documents | .docx files |
format-pptx | Microsoft PowerPoint | .pptx files |
format-xlsx | Microsoft Excel | .xlsx files |
| Extra | Description | Use Case |
|---|---|---|
format-web | HTML and Markdown | Web content |
format-html | HTML parsing | Web pages and HTML files |
format-markdown | Markdown parsing | .md files |
| Extra | Description | Use Case |
|---|---|---|
format-latex | LaTeX documents | .tex files |
format-xml-xbrl | XBRL financial reports | Financial documents |
format-html-render | HTML rendering with Playwright | Dynamic web content |
format-audio | Audio transcription (Whisper) | .wav, .mp3 files |
| Extra | Description | Use Case |
|---|---|---|
feat-ocr-rapidocr | RapidOCR (lightweight) | Fast OCR |
feat-ocr-rapidocr-onnx | RapidOCR with ONNX runtime | Optimized OCR |
feat-ocr-easyocr | EasyOCR | Multi-language OCR |
feat-ocr-tesserocr | Tesseract OCR | High-accuracy OCR |
feat-ocr-mac | macOS native OCR | macOS only |
| Extra | Description | Use Case |
|---|---|---|
models-local | Local PyTorch models | GPU/CPU inference |
models-remote | Remote model serving (Triton) | Production deployments |
models-onnxruntime | ONNX Runtime acceleration | Optimized inference |
models-vlm-inline | Vision Language Models | Image understanding, inline processing |
| Extra | Description | Use Case |
|---|---|---|
feat-chunking | Document chunking | RAG applications |
service-client | Docling service client | Remote processing |
MIT License - See LICENSE