docs/releasenotes/version03.md
% SPDX-FileCopyrightText: 2022 James R. Barlow % SPDX-License-Identifier: CC-BY-SA-4.0
Changes
47
"convert() got and unexpected keyword argument 'dpi'" by upgrading to
img2pdf 0.2New features
--deskew and
--clean-final disable this mode, necessarily.)--tesseract-pagesegmode allows you to pass page
segmentation arguments to Tesseract OCR. This helps for two column
text and other situations that confuse Tesseract.Changes
Changes
Changes
20: uppercase .PDF extension not accepted--pdf-renderer=auto, to let OCRmyPDF pick the
best PDF renderer. Currently it always chooses the 'hocrtransform'
renderer but that behavior may change.New features
pip
package managerocrmypdf to /usr/local/bin or equivalent for
system-wide access and easier typing--help)--pdf-renderer tesseract)--title,
etc.)--skip-big)--tesseract-timeout)Changes
New, robust rewrite in Python 3.4+ with ruffus pipelines
Now uses Ghostscript 9.14's improved color conversion model to preserve PDF colors
OCR text is now rendered in the PDF as invisible text. Previous versions of OCRmyPDF incorrectly rendered visible text with an image on top.
All "tasks" in the pipeline can be executed in parallel on any available CPUs, increasing performance
The -o DPI argument has been phased out, in favor of
--oversample DPI, in case we need -o OUTPUTFILE in the future
Removed several dependencies, so it's easier to install. We no longer use:
Some new external dependencies are required or optional, compared to v2.x:
Release candidates^
rc9:
118:
report error if ghostscript iccprofiles are missing111: PDF
rasterized to palette filerc8:
111:
exception thrown if PDF is missing DocumentInfo dictionaryrc7:
rc6:
rc5:
rc4:
rc3: skipping version number intentionally to avoid confusion with Tesseract
rc2: first release for public testing to test-PyPI, Github
rc1: testing release process
./OCRmyPDF.sh script is still available for now-vvv is no longer supportedconfig.sh has been removed. Instead, you
can feed a file to the arguments for common settings:ocrmypdf input.pdf output.pdf @settings.txt
where settings.txt contains one argument per line, for example:
-l
deu
--author
A. Merkel
--pdf-renderer
tesseract
Fixes
Notes and known issues
--pdf-renderer tesseract will output files with an incorrect page
size in Tesseract 3.03, due to a bug in Tesseract.