Back to Docling

Docling Slim

packages/docling-slim/README.md

2.92.04.4 KB
Original Source

Docling Slim

Lightweight SDK for parsing documents with minimal dependencies and opt-in extras

Docling Slim is a minimal-dependency version of Docling that allows you to install only the components you need. It provides the core document processing functionality with ~50MB of base dependencies, and you can add specific features through optional extras.

When to Use Docling Slim

  • Use docling (recommended): If you want the full-featured experience with all standard capabilities
  • Use docling-slim: If you need fine-grained control over dependencies or want to minimize installation size

For Most Users: Use the Main Docling Package

We recommend most users install the full-featured docling package instead:

bash
pip install docling

The docling package includes all standard features, the CLI tools, and is the easiest way to get started. Visit the main Docling documentation for complete guides and examples.

Installation

With Specific Features

bash
# PDF support with local models
pip install docling-slim[format-pdf,models-local]

# Office formats only
pip install docling-slim[format-office]

# PDF + CLI
pip install docling-slim[format-pdf,cli]

# Docling service client for using the Docling Serve API
pip install docling-slim[service-client]

Available Extras

Convenience Bundles

ExtraDescriptionUse Case
standardAll standard features (same as docling package)Full-featured usage
allAll available extrasComplete installation

CLI

ExtraDescriptionUse Case
cliCommand-line interface (typer, rich)CLI tools (docling, docling-tools)

Core Components

ExtraDescriptionUse Case
convert-coreCore conversion components (numpy, pillow, scipy)Basic document conversion
extract-coreStructured information extractionData extraction from documents

Format Support

PDF Formats

ExtraDescriptionUse Case
format-pdfPDF parsing (pypdfium2 + docling-parse)PDF documents
format-pdf-pypdfium2PDF rendering onlyLightweight PDF support
format-pdf-doclingAdvanced PDF parsingComplex PDF layouts

Office Formats (office = docx + pptx + xlsx)

ExtraDescriptionUse Case
format-officeAll Office formatsMicrosoft Office documents
format-docxMicrosoft Word documents.docx files
format-pptxMicrosoft PowerPoint.pptx files
format-xlsxMicrosoft Excel.xlsx files

Web Formats (web = html + markdown)

ExtraDescriptionUse Case
format-webHTML and MarkdownWeb content
format-htmlHTML parsingWeb pages and HTML files
format-markdownMarkdown parsing.md files

Other Formats

ExtraDescriptionUse Case
format-latexLaTeX documents.tex files
format-xml-xbrlXBRL financial reportsFinancial documents
format-html-renderHTML rendering with PlaywrightDynamic web content
format-audioAudio transcription (Whisper).wav, .mp3 files

OCR Engines

ExtraDescriptionUse Case
feat-ocr-rapidocrRapidOCR (lightweight)Fast OCR
feat-ocr-rapidocr-onnxRapidOCR with ONNX runtimeOptimized OCR
feat-ocr-easyocrEasyOCRMulti-language OCR
feat-ocr-tesserocrTesseract OCRHigh-accuracy OCR
feat-ocr-macmacOS native OCRmacOS only

Models

ExtraDescriptionUse Case
models-localLocal PyTorch modelsGPU/CPU inference
models-remoteRemote model serving (Triton)Production deployments
models-onnxruntimeONNX Runtime accelerationOptimized inference
models-vlm-inlineVision Language ModelsImage understanding, inline processing

Other features

ExtraDescriptionUse Case
feat-chunkingDocument chunkingRAG applications
service-clientDocling service clientRemote processing

License

MIT License - See LICENSE