docs/content/Guides/ocr.mdx
import { Callout } from 'nextra/components'
DocsGPT uses Docling as the default parser layer for many document formats. OCR is optional and controlled by two settings:
DOCLING_OCR_ENABLED=false
DOCLING_OCR_ATTACHMENTS_ENABLED=false
DOCLING_OCR_ENABLED: OCR behavior for Source Docs ingestion.DOCLING_OCR_ATTACHMENTS_ENABLED: OCR behavior for chat attachments uploaded from the message box./api/upload.ingest_worker).SimpleDirectoryReader parses files with get_default_file_extractor.DOCLING_OCR_ENABLED./api/store_attachment.attachment_worker parses and stores the attachment in Postgres (attachments table).DOCLING_OCR_ATTACHMENTS_ENABLED.Docling OCR behavior is different for PDFs vs images:
By default, Docling parser classes use RapidOCR options (language default: english).
When attachments are used in chat, behavior depends on the selected model/provider:
This means OCR quality is especially important for text fallback paths and for models without native attachment support.
For most OCR-enabled use cases, enable both flags:
DOCLING_OCR_ENABLED=true
DOCLING_OCR_ATTACHMENTS_ENABLED=true
After changing these settings, restart the API and Celery worker.
PARSE_IMAGE_REMOTE=true.