docs/tools/ocr-pdf.md
Turn scanned documents into searchable, copyable PDFs. The tool uses Tesseract OCR to recognize text in images and overlays an invisible text layer on top of the original pages, preserving their visual appearance while making the content fully searchable.
.txt file.| Setting | Values | Default | Purpose |
|---|---|---|---|
| Resolution | Standard (192 DPI), High (288 DPI), Ultra (384 DPI) | High | Higher resolution improves accuracy on small text but takes longer |
| Binarize Image | On / Off | Off | Enhances contrast for clean scans by converting to black and white |
| Character Whitelist Preset | None, Alphanumeric, Numbers + Currency, Letters Only, Numbers Only, Invoice, Forms, Custom | None | Restricts recognized characters to improve accuracy for specific document types |
| Character Whitelist | Free text | Empty | Manual character set when preset is set to Custom |