Back to Docling

CLI reference

docs/reference/cli.md

2.96.08.8 KB
Original Source

CLI reference

This page documents Docling's command line tools. It is generated by scripts/render_cli_reference.py from the live Typer apps — do not edit by hand.

docling

Usage

text
docling [OPTIONS] source

Arguments

NameTypeRequiredDescription
sourcetextyesPDF files to convert. Can be local file / directory paths or URL.

Options

NameTypeDefaultDescription
--fromdocx, pptx, html, image, pdf, asciidoc, md, csv, xlsx, xml_uspto, xml_jats, xml_xbrl, mets_gbs, json_docling, audio, vtt, latex (repeatable)Input formats to accept. Defaults to all supported formats.
--tomd, json, yaml, html, html_split_page, text, doctags, vtt (repeatable)Specify output formats. Defaults to Markdown.
--show-layout / --no-show-layoutflagfalseIf enabled, the page images will show the bounding-boxes of the items.
--headerstextSpecify http request headers used when fetching url input sources in the form of a JSON string
--image-export-modeplaceholder, embedded, referencedembeddedImage export mode for image-capable document outputs (JSON, YAML, HTML, HTML split-page, and Markdown). Text, DocTags, and WebVTT outputs do not export images. With placeholder, only the position of the image is marked in the output. In embedded mode, the image is embedded as base64 encoded string. In referenced mode, the image is exported in PNG format and referenced from the main exported document.
--pipelinelegacy, standard, vlm, asrstandardChoose the pipeline to process PDF or image files.
--vlm-modeltextgranite_doclingChoose the VLM preset to use with PDF or image files. Available presets: smoldocling, granite_docling, deepseek_ocr, granite_vision, pixtral, got_ocr, phi4, qwen, nanonets_ocr2, gemma_12b, gemma_27b, dolphin, glm_ocr, lightonocr, falcon_ocr
--asr-modelwhisper_tiny, whisper_small, whisper_medium, whisper_base, whisper_large, whisper_turbo, whisper_tiny_mlx, whisper_small_mlx, whisper_medium_mlx, whisper_base_mlx, whisper_large_mlx, whisper_turbo_mlx, whisper_tiny_native, whisper_small_native, whisper_medium_native, whisper_base_native, whisper_large_native, whisper_turbo_nativewhisper_tinyChoose the ASR model to use with audio/video files.
--ocr / --no-ocrflagtrueIf enabled, the bitmap content will be processed using OCR.
--force-ocr / --no-force-ocrflagfalseReplace any existing text with OCR generated text over the full content.
--tables / --no-tablesflagtrueIf enabled, the table structure model will be used to extract table information.
--ocr-enginetextautoThe OCR engine to use. When --allow-external-plugins is not set, the available values are: auto, easyocr, kserve_v2_ocr, ocrmac, rapidocr, tesserocr, tesseract. Use the option --show-external-plugins to see the options allowed with external plugins.
--ocr-langtextProvide a comma-separated list of languages used by the OCR engine. Note that each OCR engine has different values for the language names.
--psmintegerPage Segmentation Mode for the OCR engine (0-13).
--pdf-backendpypdfium2, docling_parse, dlparse_v1, dlparse_v2, dlparse_v4docling_parseThe PDF backend to use.
--pdf-passwordtextPassword for protected PDF documents
--table-modefast, accurateaccurateThe mode to use in the table structure model.
--enrich-code / --no-enrich-codeflagfalseEnable the code enrichment model in the pipeline.
--enrich-formula / --no-enrich-formulaflagfalseEnable the formula enrichment model in the pipeline.
--enrich-picture-classes / --no-enrich-picture-classesflagfalseEnable the picture classification enrichment model in the pipeline.
--enrich-picture-description / --no-enrich-picture-descriptionflagfalseEnable the picture description model in the pipeline.
--enrich-chart-extraction / --no-enrich-chart-extractionflagfalseEnable chart data extraction from bar, pie, and line charts.
--artifacts-pathpathIf provided, the location of the model artifacts.
--enable-remote-services / --no-enable-remote-servicesflagfalseMust be enabled when using models connecting to remote services.
--allow-external-plugins / --no-allow-external-pluginsflagfalseMust be enabled for loading modules from third-party plugins.
--show-external-plugins / --no-show-external-pluginsflagfalseList the third-party plugins which are available when the option --allow-external-plugins is set.
--abort-on-error / --no-abort-on-errorflagfalseIf enabled, the processing will be aborted when the first error is encountered.
--outputpath.Output directory where results are saved.
--verbose / -vinteger0Set the verbosity level. -v for info logging, -vv for debug logging.
--debug-visualize-cells / --no-debug-visualize-cellsflagfalseEnable debug output which visualizes the PDF cells
--debug-visualize-ocr / --no-debug-visualize-ocrflagfalseEnable debug output which visualizes the OCR cells
--debug-visualize-layout / --no-debug-visualize-layoutflagfalseEnable debug output which visualizes the layout clusters
--debug-visualize-tables / --no-debug-visualize-tablesflagfalseEnable debug output which visualizes the table cells
--versionflagShow version information.
--document-timeoutfloatThe timeout for processing each document, in seconds.
--num-threadsinteger4Number of threads
--deviceauto, cpu, cuda, mps, xpuautoAccelerator device
--logoflagDocling logo
--page-batch-sizeinteger4Number of pages processed in one batch. Default: 4
--profiling / --no-profilingflagfalseIf enabled, it summarizes profiling details for all conversion stages.
--save-profiling / --no-save-profilingflagfalseIf enabled, it saves the profiling summaries to json.

docling-tools

Usage

text
docling-tools [OPTIONS] COMMAND [ARGS]...

Subcommands

CommandDescription
docling-tools models

docling-tools models

Usage

text
docling-tools models [OPTIONS] COMMAND [ARGS]...

Subcommands

CommandDescription
docling-tools models download
docling-tools models download-hf-repo

docling-tools models download

Usage

text
docling-tools models download [OPTIONS] [MODELS]:[layout|tableformer|tableformerv2|code_formula|picture_classifier|smolvlm|granitedocling|granitedocling_mlx|smoldocling|smoldocling_mlx|granite_vision|granite_chart_extraction|granite_chart_extraction_v4|rapidocr|easyocr]...

Arguments

NameTypeRequiredDescription
MODELSlayout, tableformer, tableformerv2, code_formula, picture_classifier, smolvlm, granitedocling, granitedocling_mlx, smoldocling, smoldocling_mlx, granite_vision, granite_chart_extraction, granite_chart_extraction_v4, rapidocr, easyocrnoModels to download (default behavior: a predefined set of models will be downloaded).

Options

NameTypeDefaultDescription
-o / --output-dirpath/Users/dol/.cache/docling/modelsThe directory where to download the models.
--force / --no-forceflagfalseIf true, the download will be forced.
--allflagfalseIf true, all available models will be downloaded (mutually exclusive with passing specific models).
-q / --quietflagfalseNo extra output is generated, the CLI prints only the directory with the cached models.

docling-tools models download-hf-repo

Usage

text
docling-tools models download-hf-repo [OPTIONS] MODELS...

Arguments

NameTypeRequiredDescription
MODELStextyesSpecific models to download from HuggingFace identified by their repo id. For example: docling-project/docling-models .

Options

NameTypeDefaultDescription
-o / --output-dirpath/Users/dol/.cache/docling/modelsThe directory where to download the models.
--force / --no-forceflagfalseIf true, the download will be forced.
-q / --quietflagfalseNo extra output is generated, the CLI prints only the directory with the cached models.