mupdf/docs/reference/cxx-and-derived-bindings.rst
.. Copyright (C) 2001-2025 Artifex Software, Inc. .. All Rights Reserved.
.. meta:: :description: MuPDF documentation :keywords: MuPDF, pdf, epub
.. We define crude substitutions that implement simple expand/contract blocks in html. Unfortunately it doesn't seem possible to pass parameters to substitutions so we can't specify text to be shown next to html's details triangle.
.. |expand_begin| raw:: html
<details>
<summary><strong>Show/hide</strong></summary>
.. |expand_end| raw:: html
</details>
Auto-generated abstracted :title:C++, :title:Python and :title:C#
versions of the :title:MuPDF C API are available.
The C++ API is machine-generated from the C API header files and adds various abstractions such as automatic contexts and automatic reference counting.
The Python and C# APIs are generated from the C++ API using SWIG, so automatically include the C++ API's abstractions.
.. graphviz::
digraph
{
size="4,4";
labeljust=l;
"MuPDF C API" [shape="rectangle"]
"MuPDF C++ API" [shape="rectangle"]
"SWIG" [shape="oval"]
"MuPDF Python API" [shape="rectangle"]
"MuPDF C# API" [shape="rectangle"]
"MuPDF C API" -> "MuPDF C++ API" [label=" Parse C headers with libclang,\l generate abstractions.\l"]
"MuPDF C++ API" -> "SWIG" [label=" Parse C++ headers with SWIG."]
"SWIG" -> "MuPDF Python API"
"SWIG" -> "MuPDF C# API"
}
Basics
* Auto-generated from the MuPDF C API's header files.
* Everything is in C++ namespace ``mupdf``.
* All functions and methods do not take ``fz_context*`` arguments.
(Automatically-generated per-thread contexts are used internally.)
* All MuPDF ``setjmp()``/``longjmp()``-based exceptions are converted into C++ exceptions.
Low-level C++ API
The MuPDF C API is provided as low-level C++ functions with ll_ prefixes.
No fz_context* arguments.
MuPDF exceptions are converted into C++ exceptions.
Class-aware C++ API
C++ wrapper classes wrap most ``fz_*`` and ``pdf_*`` C structs:
* Class names are camel-case versions of the wrapped struct's
name, for example ``fz_document``'s wrapper class is ``mupdf::FzDocument``.
* Classes automatically handle reference counting of the underlying C structs,
so there is no need for manual calls to ``fz_keep_*()`` and ``fz_drop_*()``, and
class instances can be treated as values and copied arbitrarily.
Class-aware functions and methods take and return wrapper class instances
instead of MuPDF C structs:
* No ``fz_context*`` arguments.
* MuPDF exceptions are converted into C++ exceptions.
* Class-aware functions have the same names as the underlying C API function.
* Args that are pointers to a MuPDF struct will be changed to take a reference to
the corresponding wrapper class.
* Where a MuPDF function returns a pointer to a struct, the class-aware C++
wrapper will return a wrapper class instance by value.
* Class-aware functions that have a C++ wrapper class as their first parameter
are also provided as a member function of the wrapper class, with the same
name as the class-aware function.
* Wrapper classes are defined in ``mupdf/platform/c++/include/mupdf/classes.h``.
* Class-aware functions are declared in ``mupdf/platform/c++/include/mupdf/classes2.h``.
*
Wrapper classes for reference-counted MuPDF structs:
*
The C++ wrapper classes will have a public ``m_internal`` member that is a
pointer to the underlying MuPDF struct.
*
If a MuPDF C function returns a null pointer to a MuPDF struct, the
class-aware C++ wrapper will return an instance of the wrapper class with a
null ``m_internal`` member.
*
The C++ wrapper class will have an ``operator bool()`` that returns true if
the ``m_internal`` member is non-null.
[Introduced 2024-07-08.]
Usually it is more convenient to use the class-aware C++ API rather than the
low-level C++ API.
C++ Exceptions
C++ exceptions use classes for each FZ_ERROR_* enum, all derived from a class
mupdf::FzErrorBase which in turn derives from std::exception.
For example if MuPDF C code does fz_throw(ctx, FZ_ERROR_GENERIC, "something failed"), this will appear as a C++ exception with type
mupdf::FzErrorGeneric. Its what() method will return code=2: something failed, and it will have a public member m_code set to FZ_ERROR_GENERIC.
Example wrappers
The MuPDF C API function ``fz_new_buffer_from_page()`` is available as these
C++ functions/methods:
.. code-block:: c++
// MuPDF C function.
fz_buffer *fz_new_buffer_from_page(fz_context *ctx, fz_page *page, const fz_stext_options *options);
// MuPDF C++ wrappers.
namespace mupdf
{
// Low-level wrapper:
::fz_buffer *ll_fz_new_buffer_from_page(::fz_page *page, const ::fz_stext_options *options);
// Class-aware wrapper:
FzBuffer fz_new_buffer_from_page(const FzPage& page, FzStextOptions& options);
// Method in wrapper class FzPage:
struct FzPage
{
...
FzBuffer fz_new_buffer_from_page(FzStextOptions& options);
...
};
}
Extensions beyond the basic C API
Some generated classes have extra begin() and end() methods to allow
standard C++ iteration:
|expand_begin|
.. code-block:: c++
#include "mupdf/classes.h"
#include "mupdf/functions.h"
#include <iostream>
void show_stext(mupdf::FzStextPage& page)
{
for (mupdf::FzStextPage::iterator it_page: page)
{
mupdf::FzStextBlock block = *it_page;
for (mupdf::FzStextBlock::iterator it_block: block)
{
mupdf::FzStextLine line = *it_block;
for (mupdf::FzStextLine::iterator it_line: line)
{
mupdf::FzStextChar stextchar = *it_line;
fz_stext_char* c = stextchar.m_internal;
using namespace mupdf;
std::cout << "FzStextChar("
<< "c=" << c->c
<< " color=" << c->color
<< " origin=" << c->origin
<< " quad=" << c->quad
<< " size=" << c->size
<< " font_name=" << c->font->name
<< "\n";
}
}
}
}
|expand_end|
There are various custom class methods and constructors.
There are extra functions for generating a text representation of 'POD' (plain old data) structs and their C++ wrapper classes.
For example for fz_rect we provide these functions:
.. code-block:: c++
std::ostream& operator<< (std::ostream& out, const fz_rect& rhs);
std::ostream& operator<< (std::ostream& out, const FzRect& rhs);
std::string to_string_fz_rect(const fz_rect& s);
std::string to_string(const fz_rect& s);
std::string Rect::to_string() const;
These each generate text such as: (x0=90.51 y0=160.65 x1=501.39 y1=1215.6)
Runtime environmental variables
All builds
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
* **MUPDF_mt_ctx**
Controls support for multi-threading on startup.
* If set with value ``0``, a single ``fz_context*`` is used for all threads; this
might give a small performance increase in single-threaded programmes, but
will be unsafe in multi-threaded programmes.
* Otherwise each thread has its own ``fz_context*``.
One can instead call ``mupdf::reinit_singlethreaded()`` on startup to force
single-threaded mode. This should be done before any other use of MuPDF.
Debug builds only
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Debug builds contain diagnostics/checking code that is activated via these
environmental variables:
* **MUPDF_check_refs**
If ``1``, generated code checks MuPDF struct reference counts at
runtime.
* **MUPDF_check_error_stack**
If ``1``, generated code outputs a diagnostic if a MuPDF function changes the
current ``fz_context``'s error stack depth.
* **MUPDF_trace**
If ``1`` or ``2``, class-aware code outputs a diagnostic each time it calls a
MuPDF function (apart from keep/drop functions).
If ``2``, low-level wrappers output a diagnostic each time they are
called. We also show arg POD and pointer values.
* **MUPDF_trace_director**
If ``1``, generated code outputs a diagnostic when doing special
handling of MuPDF structs containing function pointers.
* **MUPDF_trace_exceptions**
If ``1``, generated code outputs diagnostics when it converts MuPDF
``setjmp()``/``longjmp()`` exceptions into C++ exceptions.
* **MUPDF_trace_keepdrop**
If ``1``, generated code outputs diagnostics for calls to ``*_keep_*()`` and
``*_drop_*()``.
Limitations
We do not wrap variadic functions such as fz_write_printf().
Global instances of C++ wrapper classes are not supported.
This is because:
C++ wrapper class destructors generally call MuPDF functions (for example
fz_drop_*()).
The C++ bindings use internal thread-local objects to allow per-thread
fz_context's to be efficiently obtained for use with underlying MuPDF
functions.
C++ globals are destructed after thread-local objects are destructed.
So if a global instance of a C++ wrapper class is created, its destructor
will attempt to get a fz_context* using internal thread-local objects
which will have already been destroyed.
We attempt to display a diagnostic when this happens, but this cannot be relied on as behaviour is formally undefined.
A Python module called mupdf.
A C# namespace called mupdf.
Auto-generated from the C++ MuPDF API using SWIG, so inherits the abstractions of the C++ API:
fz_context* arguments.fz_keep_*() or fz_drop_*(), and we have value-semantics for class instances.Output parameters are returned as tuples.
For example MuPDF C function fz_read_best() has prototype::
fz_buffer *fz_read_best(fz_context *ctx, fz_stream *stm, size_t initial, int *truncated);
The class-aware Python wrapper is::
mupdf.fz_read_best(stm, initial)
and returns (buffer, truncated), where buffer is a SWIG proxy for a
mupdf::FzBuffer instance and truncated is an integer.
Allows implementation of mutool in Python - see
mupdf:scripts/mutool.py <https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/tree/scripts/mutool.py>_
and
mupdf:scripts/mutool_draw.py <https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/tree/scripts/mutool_draw.py>_.
Provides text representation of simple 'POD' structs:
.. code-block:: python
rect = mupdf.FzRect(...)
print(rect) # Will output text such as: (x0=90.51 y0=160.65 x1=501.39 y1=215.6)
This works for classes where the C++ API defines a to_string() method as described above.
__str__()` method, and an identical `__repr__() method.ToString() method.Uses SWIG Director classes to allow C function pointers in MuPDF structs to call Python code.
pipThe Python mupdf module is available on the Python Package Index (PyPI) website <https://pypi.org/>_.
pip install mupdf.Auto-generated documentation for the C, C++ and Python APIs is available at: https://ghostscript.com/~julian/mupdf-bindings/
All content is generated from the comments in MuPDF header files.
This documentation is generated from an internal development tree, so may contain features that are not yet publicly available.
It is updated only intermittently.
Using the Python API
Minimal Python code that uses the ``mupdf`` module::
import mupdf
document = mupdf.FzDocument('foo.pdf')
A simple example Python test script (run by ``scripts/mupdfwrap.py -t``) is:
* `scripts/mupdfwrap_test.py <https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/tree/scripts/mupdfwrap_test.py>`_
More detailed usage of the Python API can be found in:
* `scripts/mutool.py <https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/tree/scripts/mutool.py>`_
* `scripts/mutool_draw.py <https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/tree/scripts/mutool_draw.py>`_
**Example Python code that shows all available information about a document's Stext blocks, lines and characters**:
|expand_begin|
::
#!/usr/bin/env python3
import mupdf
def show_stext(document):
'''
Shows all available information about Stext blocks, lines and characters.
'''
for p in range(document.fz_count_pages()):
page = document.fz_load_page(p)
stextpage = mupdf.FzStextPage(page, mupdf.FzStextOptions())
for block in stextpage:
block_ = block.m_internal
log(f'block: type={block_.type} bbox={block_.bbox}')
for line in block:
line_ = line.m_internal
log(f' line: wmode={line_.wmode}'
+ f' dir={line_.dir}'
+ f' bbox={line_.bbox}'
)
for char in line:
char_ = char.m_internal
log(f' char: {chr(char_.c)!r} c={char_.c:4} color={char_.color}'
+ f' origin={char_.origin}'
+ f' quad={char_.quad}'
+ f' size={char_.size:6.2f}'
+ f' font=('
+ f'is_mono={char_.font.flags.is_mono}'
+ f' is_bold={char_.font.flags.is_bold}'
+ f' is_italic={char_.font.flags.is_italic}'
+ f' ft_substitute={char_.font.flags.ft_substitute}'
+ f' ft_stretch={char_.font.flags.ft_stretch}'
+ f' fake_bold={char_.font.flags.fake_bold}'
+ f' fake_italic={char_.font.flags.fake_italic}'
+ f' has_opentype={char_.font.flags.has_opentype}'
+ f' invalid_bbox={char_.font.flags.invalid_bbox}'
+ f' name={char_.font.name}'
+ f')'
)
document = mupdf.FzDocument('foo.pdf')
show_stext(document)
|expand_end|
Basic PDF viewers written in Python and C#
scripts/mupdfwrap_gui.py <https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/tree/scripts/mupdfwrap_gui.py>_
scripts/mupdfwrap_gui.cs <https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/tree/scripts/mupdfwrap_gui.cs>_
Build and run with:
./scripts/mupdfwrap.py -b all --test-python-gui./scripts/mupdfwrap.py -b --csharp all --test-csharp-guiGeneral requirements
* Windows, Linux, MacOS or OpenBSD.
*
Build should take place inside a Python `venv <https://docs.python.org/3.8/library/venv.html>`_.
*
`libclang Python interface onto <https://libclang.readthedocs.io/en/latest/index.html>`_ the `clang C/C++ parser <https://clang.llvm.org/>`_.
* `swig <https://swig.org/>`_, for Python and C# bindings.
*
`Mono <https://www.mono-project.com/>`_, for C# bindings on platforms
other than Windows.
Setting up
Windows only """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Install Python.
Use the Python Windows installer from the python.org website: http://www.python.org/downloads
Don't use other installers such as the Microsoft Store Python package.
A default installation is sufficient.
Debug binaries are required for debug builds of the MuPDF Python API.
If "Customize Installation" is chosen, make sure to include "py launcher" so
that the py command will be available.
Install Visual Studio 2019. Later versions may not work with MuPDF's solution and build files.
All platforms """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Get the latest version of MuPDF in git.
.. code-block:: shell
git clone --recursive git://git.ghostscript.com/mupdf.git
Create and enter a Python venv <https://docs.python.org/3.8/library/venv.html>_ and upgrade pip.
Windows.
.. code-block:: bat
py -m venv pylocal
.\pylocal\Scripts\activate
python -m pip install --upgrade pip
Linux, MacOS, OpenBSD
.. code-block:: shell
python3 -m venv pylocal
. pylocal/bin/activate
python -m pip install --upgrade pip
General build flags
In all of the commands below, one can set environmental variables to control
the build of the underlying MuPDF C API, for example ``USE_SYSTEM_LIBJPEG=yes``.
In addition, ``XCXXFLAGS`` can be used to set additional C++ compiler flags when
building the C++ and Python bindings (the name is analogous to the ``XCFLAGS``
used by MuPDF's makefile when compiling the core library).
Building and installing the Python bindings using ``pip``
Windows, Linux, MacOS.
.. code-block:: shell
cd mupdf && pip install -vv .
OpenBSD.
Building using pip is not supported because libclang is not
available from pypi.org so pip will fail to install prerequisites from
pypackage.toml.
Instead one can run setup.py directly:
.. code-block:: shell
cd mupdf && setup.py install
Building the Python bindings
* Windows, Linux, MacOS.
.. code-block:: shell
pip install libclang swig setuptools
cd mupdf && python scripts/mupdfwrap.py -b all
* OpenBSD.
``libclang`` is not available from pypi.org, but we can instead use
the system ``py3-llvm`` package.
.. code-block:: shell
sudo pkg_add py3-llvm
pip install swig setuptools
cd mupdf && python scripts/mupdfwrap.py -b all
Building the C++ bindings
Windows, Linux, MacOS.
.. code-block:: shell
pip install libclang setuptools
cd mupdf && python scripts/mupdfwrap.py -b m01
OpenBSD.
libclang is not available from pypi.org, but we can instead use
the system py3-llvm package.
.. code-block:: shell
sudo pkg_add py3-llvm
pip install setuptools
cd mupdf && python scripts/mupdfwrap.py -b m01
Building the C# bindings
* Windows.
.. code-block:: shell
pip install libclang swig setuptools
cd mupdf && python scripts/mupdfwrap.py -b --csharp all
* Linux.
.. code-block:: shell
sudo apt install mono-devel
pip install libclang swig
cd mupdf && python scripts/mupdfwrap.py -b --csharp all
* MacOS.
Building the C# bindings on MacOS is not currently supported.
* OpenBSD.
.. code-block:: shell
sudo pkg_add py3-llvm mono
pip install swig setuptools
cd mupdf && python scripts/mupdfwrap.py -b --csharp all
Using the bindings
To use the bindings, one has to tell the OS where to find the MuPDF runtime files.
C++ and C# bindings:
Windows.
.. code-block:: shell
set PATH=.../mupdf/build/shared-release-x64-py3.11;%PATH%
Replace x64 with x32 if using 32-bit.
Replace 3.11 with the appropriate python version number.
Linux, OpenBSD.
.. code-block:: shell
LD_LIBRARY_PATH=.../mupdf/build/shared-release
(LD_LIBRARY_PATH must be an absolute path.)
MacOS.
.. code-block:: shell
DYLD_LIBRARY_PATH=.../mupdf/build/shared-release
Python bindings:
If the bindings have been built and installed using pip install,
they will already be available within the venv.
Otherwise:
Windows.
.. code-block:: shell
PYTHONPATH=.../mupdf/build/shared-release-x64-py3.11
Replace x64 with x32 if using 32-bit.
Replace 3.11 with the appropriate python version number.
Linux, MacOS, OpenBSD.
.. code-block:: shell
PYTHONPATH=.../mupdf/build/shared-release
Notes
* Running tests.
Basic tests can be run by appending args to the ``scripts/mupdfwrap.py``
command.
This will also demonstrate how to set environment variables such as
``PYTHONPATH`` or ``LD_LIBRARY_PATH`` to the MuPDF build directory.
* Python tests.
* ``--test-python``
* ``--test-python-gui``
* C# tests.
* ``--test-csharp``
* ``--test-csharp-gui``
* C++ tests.
* ``--test-cpp``
* C++ bindings and ``NDEBUG``.
When building client code that uses the C++ bindings, ``NDEBUG`` must
be defined/undefined to match how the C++ bindings were built. By
default the C++ bindings are a release build with ``NDEBUG`` defined, so
usually client code must also be built with ``NDEBUG`` defined. Otherwise
there will be build errors for missing C++ destructors, for example
``mupdf::FzMatrix::~FzMatrix()``.
[This is because we define some destructors in debug builds only; this allows
internal reference counting checks.]
* Specifying the location of Visual Studio's ``devenv.com`` on Windows.
``scripts/mupdfwrap.py`` looks for Visual Studio's ``devenv.com`` in
standard locations; this can be overridden with:
.. code-block:: shell
python scripts/mupdfwrap.py -b --devenv <devenv.com-location> ...
* Specifying compilers.
On non-Windows, we use ``cc`` and ``c++`` as default C and C++ compilers;
override by setting environment variables ``$CC`` and ``$CXX``.
* OpenBSD ``libclang``.
*
``libclang`` cannot be installed with pip on OpenBSD - wheels are not
available and building from source fails.
However unlike on other platforms, the system python-clang package
(``py3-llvm``) is integrated with the system's libclang and can be
used directly.
So the above examples use ``pkg_add py3-llvm``.
* Alternatives to Python package ``libclang`` generally do not work.
For example pypi.org's `clang <https://pypi.org/project/clang/>`_, or
Debian's `python-clang <https://packages.debian.org/search?keywords=python+clang&searchon=names&suite=stable§ion=all>`_.
These are inconvenient to use because they require explicit setting of
``LD_LIBRARY_PATH`` to point to the correct libclang dynamic library.
* Debug builds.
One can specify a debug build using the ``-d <build-directory>`` arg
before ``-b``.
.. code-block:: shell
python ./scripts/mupdfwrap.py -d build/shared-debug -b ...
*
Debug builds of the Python and C# bindings on Windows have not been
tested. There may be issues with requiring a debug version of the Python
interpreter, for example ``python311_d.lib``.
*
C# build failure: ``cstring.i not implemented for this target`` and/or
``Unknown directive '%cstring_output_allocate'``.
This is probably because SWIG does not include support for C#. This
has been seen in the past but as of 2023-07-19 pypi.org's default swig
seems ok.
A possible solution is to install SWIG using the system package
manager, for example ``sudo apt install swig`` on Linux, or use
``./scripts/mupdfwrap.py --swig-windows-auto ...`` on Windows.
*
C# ommisions.
Some functions are ommited from the C# API due to C# restrictions, for
example functions that return void* and have out-params (because tuples
cannot contain void* items). These will be marked with comments in the
generated mupdf.cs file.
* More information about running ``scripts/mupdfwrap.py``.
* Run ``python ./scripts/mupdfwrap.py -h``.
* Read the doc-string at beginning of ``scripts/wrap/__main__.py+``.
How ``scripts/mupdfwrap.py`` builds the APIs
Building the MuPDF C API """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
On Unix, runs make on MuPDF's Makefile with shared=yes.
On Windows, runs devenv.com on .sln and .vcxproj files within MuPDF's platform/win32/ <https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/tree/platform/win32>_
directory.
Generation of the MuPDF C++ API """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Uses clang-python to parse MuPDF's C API.
Generates C++ code that wraps the basic C interface, converting MuPDF
setjmp()/longjmp() exceptions into C++ exceptions and automatically
handling fz_context's internally.
Generates C++ wrapper classes for each fz_* and pdf_* struct, and uses various
heuristics to define constructors, methods and static methods that call
fz_*() and pdf_*() functions. These classes' constructors and destructors
automatically handle reference counting so class instances can be copied
arbitrarily.
C header file comments are copied into the generated C++ header files.
Compile and link the generated C++ code to create shared libraries.
Generation of the MuPDF Python and C# APIs """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Uses SWIG to parse the previously-generated C++ headers and generate C++, Python and C# code.
Defines some custom-written Python and C# functions and methods, for example so that out-params are returned as tuples.
If SWIG is version 4+, C++ comments are converted into Python doc-comments.
Compile and link the SWIG-generated C++ code to create shared libraries.
Building auto-generated MuPDF API documentation
Build HTML documentation for the C, C++ and Python APIs (using Doxygen and pydoc):
.. code-block:: shell
python ./scripts/mupdfwrap.py --doc all
This will generate the following tree:
.. code-block:: text
mupdf/docs/generated/
index.html
c/
c++/
python/
All content is ultimately generated from the MuPDF C header file comments.
As of 2022-2-5, it looks like ``swig -doxygen`` (swig-4.02) ignores
single-line ``/** ... */`` comments, so the generated Python code (and
hence also Pydoc documentation) is missing information.
Generated files
All generated files are within the MuPDF checkout.
C++ headers for the MuPDF C++ API are in platform/c++/include/.
Files required at runtime are in build/shared-release/.
Details
.. code-block:: text
mupdf/
build/
shared-release/ [Unix runtime files.]
libmupdf.so [MuPDF C API, not MacOS.]
libmupdf.dylib [MuPDF C API, MacOS.]
libmupdfcpp.so [MuPDF C++ API.]
mupdf.py [MuPDF Python API.]
_mupdf.so [MuPDF Python API internals.]
mupdf.cs [MuPDF C# API.]
mupdfcsharp.so [MuPDF C# API internals.]
shared-debug/
[as shared-release but debug build.]
shared-release-x*-py*/ [Windows runtime files.]
mupdfcpp.dll [MuPDF C and C++ API, x32.]
mupdfcpp64.dll [MuPDF C and C++ API, x64.]
mupdf.py [MuPDF Python API.]
_mupdf.pyd [MuPDF Python API internals.]
mupdf.cs [MuPDF C# API.]
mupdfcsharp.dll [MuPDF C# API internals.]
platform/
c++/
include/ [MuPDF C++ API header files.]
mupdf/
classes.h
classes2.h
exceptions.h
functions.h
internal.h
implementation/ [MuPDF C++ implementation source files.]
classes.cpp
classes2.cpp
exceptions.cpp
functions.cpp
internal.cpp
generated.pickle [Information from clang parse step, used by later stages.]
windows_mupdf.def [List of MuPDF public global data, used when linking mupdfcpp.dll.]
python/ [SWIG Python files.]
mupdfcpp_swig.i [SWIG input file.]
mupdfcpp_swig.i.cpp [SWIG output file.]
csharp/ [SWIG C# files.]
mupdf.cs [SWIG output file, no out-params helpers.]
mupdfcpp_swig.i [SWIG input file.]
mupdfcpp_swig.i.cpp [SWIG output file.]
win32/
Release/ [Windows 32-bit .dll, .lib, .exp, .pdb etc.]
x64/
Release/ [Windows 64-bit .dll, .lib, .exp, .pdb etc.]
mupdfcpp64.dll [Copied to build/shared-release*/mupdfcpp64.dll]
mupdfpyswig.dll [Copied to build/shared-release*/_mupdf.pyd]
mupdfcpp64.lib
mupdfpyswig.lib
win32-vs-upgrade/ [used instead of win32/ if PYMUPDF_SETUP_MUPDF_VS_UPGRADE is '1'.]
Required predefined macros
Code that will use the MuPDF DLL must be built with ``FZ_DLL_CLIENT``
predefined.
The MuPDF DLL itself is built with ``FZ_DLL`` predefined.
DLLs
There is no separate C library, instead the C and C++ APIs are
both in mupdfcpp.dll, which is built by running devenv on
platform/win32/mupdf.sln.
The Python SWIG library is called _mupdf.pyd which, despite the name, is a
standard Windows DLL, built from platform/python/mupdfcpp_swig.i.cpp.
DLL export of functions and data
On Windows, ``include/mupdf/fitz/export.h`` defines ``FZ_FUNCTION`` and
``FZ_DATA` to `__declspec(dllexport)` and/or `__declspec(dllimport)``
depending on whether ``FZ_DLL`` or ``FZ_DLL_CLIENT`` are defined.
All MuPDF C headers prefix declarations of public global data with ``FZ_DATA``.
In generated C++ code:
* Data declarations and definitions are prefixed with ``FZ_DATA``.
* Function declarations and definitions are prefixed with ``FZ_FUNCTION``.
* Class method declarations and definitions are prefixed with ``FZ_FUNCTION``.
When building ``mupdfcpp.dll`` on Windows we link with the auto-generated
``platform/c++/windows_mupdf.def`` file; this lists all C public global data.
For reasons that are not fully understood, we don't seem to need to tag
C functions with ``FZ_FUNCTION``, but this is required for C++ functions
otherwise we get unresolved symbols when building MuPDF client code.
Building the DLLs
We build Windows binaries by running devenv.com directly.
Building _mupdf.pyd is tricky because it needs to be built with a
specific Python.h and linked with a specific python.lib. This is
done by setting environmental variables MUPDF_PYTHON_INCLUDE_PATH and
MUPDF_PYTHON_LIBRARY_PATH when running devenv.com, which are referenced
by platform/win32/mupdfpyswig.vcxproj. Thus one cannot easily build
_mupdf.pyd directly from the Visual Studio GUI.
[In the git history there is code that builds _mupdf.pyd by running the
Windows compiler and linker cl.exe and link.exe directly, which avoids
the complications of going via devenv, at the expense of needing to know where
cl.exe and link.exe are.]
Wrapper functions
Wrappers for a MuPDF function ``fz_foo()`` are available in multiple forms:
* Functions in the ``mupdf`` namespace.
* ``mupdf::ll_fz_foo()``
* Low-level wrapper:
* Does not take ``fz_context*`` arg.
* Translates MuPDF exceptions into C++ exceptions.
* Takes/returns pointers to MuPDF structs.
* Code that uses these functions will need to make explicit calls to
``fz_keep_*()`` and ``fz_drop_*()``.
* ``mupdf::fz_foo()``
* High-level class-aware wrapper:
* Does not take ``fz_context*`` arg.
* Translates MuPDF exceptions into C++ exceptions.
* Takes references to C++ wrapper class instances instead of pointers to
MuPDF structs.
* Where applicable, returns C++ wrapper class instances instead of
pointers to MuPDF structs.
* Code that uses these functions does not need to call ``fz_keep_*()``
and ``fz_drop_*()`` - C++ wrapper class instances take care of reference
counting internally.
* Class methods
* Where ``fz_foo()`` has a first arg (ignoring any ``fz_context*`` arg) that
takes a pointer to a MuPDF struct ``foo_bar``, it is generally available as a
member function of the wrapper class ``mupdf::FooBar``:
* ``mupdf::FooBar::fz_foo()``
* Apart from being a member function, this is identical to class-aware
wrapper ``mupdf::fz_foo()``, for example taking references to wrapper classes
instead of pointers to MuPDF structs.
Constructors using MuPDF functions
Wrapper class constructors are created for each MuPDF function that returns an instance of a MuPDF struct.
Sometimes two such functions do not have different arg types so C++ overloading cannot distinguish between them as constructors (because C++ constructors do not have names).
We cope with this in two ways:
Create a static method that returns a new instance of the wrapper class by value.
Define an enum within the wrapper class, and provide a constructor that takes an instance of this enum to specify which MuPDF function to use.
Default constructors
All wrapper classes have a default constructor.
* For POD classes each member is set to a default value with ``this->foo =
{};``. Arrays are initialised by setting all bytes to zero using
``memset()``.
* For non-POD classes, class member ``m_internal`` is set to ``nullptr``.
* Some classes' default constructors are customized, for example:
* The default constructor for ``fz_color_params`` wrapper
``mupdf::FzColorParams`` sets state to a copy of
``fz_default_color_params``.
* The default constructor for ``fz_md5`` wrapper ``mupdf::FzMd5`` sets
state using ``fz_md5_init()``.
* These are described in class definition comments in
``platform/c++/include/mupdf/classes.h``.
Raw constructors
Many wrapper classes have constructors that take a pointer to the underlying
MuPDF C struct. These are usually for internal use only. They do not call
fz_keep_*() - it is expected that any supplied MuPDF struct is already
owned.
POD wrapper classes
Class wrappers for MuPDF structs default to having a ``m_internal`` member which
points to an instance of the wrapped struct. This works well for MuPDF structs
which support reference counting, because we can automatically create copy
constructors, ``operator=`` functions and destructors that call the associated
``fz_keep_*()`` and ``fz_drop_*()`` functions.
However where a MuPDF struct does not support reference counting and contains
simple data, it is not safe to copy a pointer to the struct, so the class
wrapper will be a POD class. This is done in one of two ways:
* ``m_internal`` is an instance of the MuPDF struct, not a pointer.
* Sometimes we provide members that give direct access to fields in
``m_internal``.
* An 'inline' POD - there is no ``m_internal`` member; instead the wrapper class
contains the same members as the MuPDF struct. This can be a little more
convenient to use.
Extra static methods
Where relevant, wrapper class can have static methods that wrap selected MuPDF
functions. For example FzMatrix does this for fz_concat(), fz_scale() etc,
because these return the result by value rather than modifying a fz_matrix
instance.
Miscellaneous custom wrapper classes
The wrapper for ``fz_outline_item`` does not contain a ``fz_outline_item`` by
value or pointer. Instead it defines C++-style member equivalents to
``fz_outline_item``'s fields, to simplify usage from C++ and Python/C#.
The fields are initialised from a ``fz_outline_item`` when the wrapper class
is constructed. In this particular case there is no need to hold on to a
``fz_outline_item``, and the use of ``std::string`` ensures that value semantics
can work.
Extra functions in C++, Python and C#
---------------------------------------------------------------
[These functions are available as low-level functions, class-aware
functions and class methods.]
.. code-block:: c++
/**
C++ alternative to ``fz_lookup_metadata()`` that returns a ``std::string``
or calls ``fz_throw()`` if not found.
*/
FZ_FUNCTION std::string fz_lookup_metadata2(fz_context* ctx, fz_document* doc, const char* key);
/**
C++ alternative to ``pdf_lookup_metadata()`` that returns a ``std::string``
or calls ``fz_throw()`` if not found.
*/
FZ_FUNCTION std::string pdf_lookup_metadata2(fz_context* ctx, pdf_document* doc, const char* key);
/**
C++ alternative to ``fz_md5_pixmap()`` that returns the digest by value.
*/
FZ_FUNCTION std::vector<unsigned char> fz_md5_pixmap2(fz_context* ctx, fz_pixmap* pixmap);
/**
C++ alternative to fz_md5_final() that returns the digest by value.
*/
FZ_FUNCTION std::vector<unsigned char> fz_md5_final2(fz_md5* md5);
/** */
FZ_FUNCTION long long fz_pixmap_samples_int(fz_context* ctx, fz_pixmap* pixmap);
/**
Provides simple (but slow) access to pixmap data from Python and C#.
*/
FZ_FUNCTION int fz_samples_get(fz_pixmap* pixmap, int offset);
/**
Provides simple (but slow) write access to pixmap data from Python and
C#.
*/
FZ_FUNCTION void fz_samples_set(fz_pixmap* pixmap, int offset, int value);
/**
C++ alternative to fz_highlight_selection() that returns quads in a
std::vector.
*/
FZ_FUNCTION std::vector<fz_quad> fz_highlight_selection2(fz_context* ctx, fz_stext_page* page, fz_point a, fz_point b, int max_quads);
struct fz_search_page2_hit
{{
fz_quad quad;
int mark;
}};
/**
C++ alternative to fz_search_page() that returns information in a std::vector.
*/
FZ_FUNCTION std::vector<fz_search_page2_hit> fz_search_page2(fz_context* ctx, fz_document* doc, int number, const char* needle, int hit_max);
/**
C++ alternative to fz_string_from_text_language() that returns information in a std::string.
*/
FZ_FUNCTION std::string fz_string_from_text_language2(fz_text_language lang);
/**
C++ alternative to fz_get_glyph_name() that returns information in a std::string.
*/
FZ_FUNCTION std::string fz_get_glyph_name2(fz_context* ctx, fz_font* font, int glyph);
/**
Extra struct containing fz_install_load_system_font_funcs()'s args,
which we wrap with virtual_fnptrs set to allow use from Python/C# via
Swig Directors.
*/
typedef struct fz_install_load_system_font_funcs_args
{{
fz_load_system_font_fn* f;
fz_load_system_cjk_font_fn* f_cjk;
fz_load_system_fallback_font_fn* f_fallback;
}} fz_install_load_system_font_funcs_args;
/**
Alternative to fz_install_load_system_font_funcs() that takes args in a
struct, to allow use from Python/C# via Swig Directors.
*/
FZ_FUNCTION void fz_install_load_system_font_funcs2(fz_context* ctx, fz_install_load_system_font_funcs_args* args);
/** Internal singleton state to allow Swig Director class to find
fz_install_load_system_font_funcs_args class wrapper instance. */
FZ_DATA extern void* fz_install_load_system_font_funcs2_state;
/** Helper for calling ``fz_document_handler::open`` function pointer via
Swig from Python/C#. */
FZ_FUNCTION fz_document* fz_document_handler_open(fz_context* ctx, const fz_document_handler *handler, fz_stream* stream, fz_stream* accel, fz_archive* dir, void* recognize_state);
/** Helper for calling a ``fz_document_handler::recognize`` function
pointer via Swig from Python/C#. */
FZ_FUNCTION int fz_document_handler_recognize(fz_context* ctx, const fz_document_handler *handler, const char *magic);
/** Swig-friendly wrapper for pdf_choice_widget_options(), returns the
options directly in a vector. */
FZ_FUNCTION std::vector<std::string> pdf_choice_widget_options2(fz_context* ctx, pdf_annot* tw, int exportval);
/** Swig-friendly wrapper for fz_new_image_from_compressed_buffer(),
uses specified ``decode`` and ``colorkey`` if they are not null (in which
case we assert that they have size ``2*fz_colorspace_n(colorspace)``). */
FZ_FUNCTION fz_image* fz_new_image_from_compressed_buffer2(
fz_context* ctx,
int w,
int h,
int bpc,
fz_colorspace* colorspace,
int xres,
int yres,
int interpolate,
int imagemask,
const std::vector<float>& decode,
const std::vector<int>& colorkey,
fz_compressed_buffer* buffer,
fz_image* mask
);
/** Swig-friendly wrapper for pdf_rearrange_pages(). */
void pdf_rearrange_pages2(
fz_context* ctx,
pdf_document* doc,
const std::vector<int>& pages,
pdf_clean_options_structure structure
);
/** Swig-friendly wrapper for pdf_subset_fonts(). */
void pdf_subset_fonts2(fz_context *ctx, pdf_document *doc, const std::vector<int>& pages);
/** Swig-friendly and typesafe way to do fz_snprintf(fmt, value). ``fmt``
must end with one of 'efg' otherwise we throw an exception. */
std::string fz_format_double(fz_context* ctx, const char* fmt, double value);
struct fz_font_ucs_gid
{{
unsigned long ucs;
unsigned int gid;
}};
/** SWIG-friendly wrapper for fz_enumerate_font_cmap(). */
std::vector<fz_font_ucs_gid> fz_enumerate_font_cmap2(fz_context* ctx, fz_font* font);
/** SWIG-friendly wrapper for pdf_set_annot_callout_line(). */
void pdf_set_annot_callout_line2(fz_context *ctx, pdf_annot *annot, std::vector<fz_point>& callout);
/** SWIG-friendly wrapper for fz_decode_barcode_from_display_list(),
avoiding leak of the returned string. */
std::string fz_decode_barcode_from_display_list2(fz_context *ctx, fz_barcode_type *type, fz_display_list *list, fz_rect subarea, int rotate);
/** SWIG-friendly wrapper for fz_decode_barcode_from_pixmap(), avoiding
leak of the returned string. */
std::string fz_decode_barcode_from_pixmap2(fz_context *ctx, fz_barcode_type *type, fz_pixmap *pix, int rotate);
/** SWIG-friendly wrapper for fz_decode_barcode_from_page(), avoiding
leak of the returned string. */
std::string fz_decode_barcode_from_page2(fz_context *ctx, fz_barcode_type *type, fz_page *page, fz_rect subarea, int rotate);
Python/C# bindings details
---------------------------------------------------------------
Extra Python functions
Access to raw C arrays """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
The following functions can be used from Python to get access to raw data:
mupdf.bytes_getitem(array, index): Gives access to individual items
in an array of unsigned char's, for example in the data returned by
mupdf::FzPixmap's samples() method.
mupdf.floats_getitem(array, index): Gives access to individual items in an
array of float's, for example in fz_stroke_state's float dash_list[32]
array. Generated with SWIG code carrays.i and array_functions(float, floats);.
mupdf.python_buffer_data(b): returns a SWIG wrapper for a const unsigned char* pointing to a Python buffer instance's raw data. For example b can
be a Python bytes or bytearray instance.
mupdfpython_mutable_buffer_data(b): returns a SWIG wrapper for an unsigned char* pointing to a Python buffer instance's raw data. For example b can
be a Python bytearray instance.
[These functions are implemented internally using SWIG's carrays.i and
pybuffer.i.
Python differences from C API
[The functions described below are also available as class methods.]
Custom methods
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Python and C# code does not easily handle functions that return raw data, for example
as an ``unsigned char*`` that is not a zero-terminated string. Sometimes we provide a
C++ method that returns a ``std::vector`` by value, so that Python and C# code can
wrap it in a systematic way.
For example ``Md5::fz_md5_final2()``.
For all functions described below, there is also a ``ll_*`` variant that
takes/returns raw MuPDF structs instead of wrapper classes.
New functions
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
* ``fz_buffer_extract_copy()``: Returns copy of buffer data as a Python ``bytes``.
* ``fz_buffer_storage_memoryview(buffer, writable)``: Returns a readonly/writable Python memoryview onto ``buffer``.
Relies on ``buffer`` existing and not changing size while the memory view is used.
* ``fz_pixmap_samples_memoryview()``: Returns Python ``memoryview`` onto ``fz_pixmap`` data.
* ``fz_lookup_metadata2(fzdocument, key)``: Return key value or raise an exception if not found:
* ``pdf_lookup_metadata2(pdfdocument, key)``: Return key value or raise an exception if not found:
Implemented in Python
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
* ``fz_format_output_path()``
* ``fz_story_positions()``
* ``pdf_dict_getl()``
* ``pdf_dict_putl()``
Non-standard API or implementation
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
* ``fz_buffer_extract()``: Returns a *copy* of the original buffer data as a Python ``bytes``. Still clears the buffer.
* ``fz_buffer_storage()``: Returns ``(size, data)`` where ``data`` is a low-level SWIG representation of the buffer's storage.
* ``fz_convert_color()``: No ``float* fv`` param, instead returns ``(rgb0, rgb1, rgb2, rgb3)``.
* ``fz_fill_text()``: ``color`` arg is tuple/list of 1-4 floats.
* ``fz_lookup_metadata(fzdocument, key)``: Return key value or None if not found:
* ``fz_new_buffer_from_copied_data()``: Takes a Python ``bytes`` (or other Python buffer) instance.
* ``fz_set_error_callback()``: Takes a Python callable; no ``void* user`` arg.
* ``fz_set_warning_callback()``: Takes a Python callable; no ``void* user`` arg.
* ``fz_warn()``: Takes single Python ``str`` arg.
* ``pdf_dict_putl_drop()``: Always raises exception because not useful with automatic ref-counts.
* ``pdf_load_field_name()``: Uses extra C++ function ``pdf_load_field_name2()`` which returns ``std::string`` by value.
* ``pdf_lookup_metadata(pdfdocument, key)``: Return key value or None if not found:
* ``pdf_set_annot_color()``: Takes single ``color`` arg which must be float or tuple of 1-4 floats.
* ``pdf_set_annot_interior_color()``: Takes single ``color`` arg which must be float or tuple of 1-4 floats.
* ``fz_install_load_system_font_funcs()``: Takes Python callbacks with no ``ctx`` arg,
which can return ``None``, ``fz_font*`` or a ``mupdf.FzFont``.
Example usage (from ``scripts/mupdfwrap_test.py:test_install_load_system_font()``)::
def font_f(name, bold, italic, needs_exact_metrics):
print(f'font_f(): Looking for font: {name=} {bold=} {italic=} {needs_exact_metrics=}.')
return mupdf.fz_new_font_from_file(...)
def f_cjk(name, ordering, serif):
print(f'f_cjk(): Looking for font: {name=} {ordering=} {serif=}.')
return None
def f_fallback(script, language, serif, bold, italic):
print(f'f_fallback(): looking for font: {script=} {language=} {serif=} {bold=} {italic=}.')
return None
mupdf.fz_install_load_system_font_funcs(font_f, f_cjk, f_fallback)
Making MuPDF function pointers call Python code
Overview """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
For MuPDF structs with function pointers, we provide a second C++ wrapper class for use by the Python bindings.
The second wrapper class has a 2 suffix, for example PdfFilterOptions2.
This second wrapper class has a virtual method for each function pointer, so
it can be used as a SWIG Director class <https://swig.org/Doc4.0/SWIGDocumentation.html#SWIGPlus_target_language_callbacks>_.
Overriding a virtual method in Python results in the Python method being called when MuPDF C code calls the corresponding function pointer.
One needs to activate the use of a Python method as a callback by calling the
special method use_virtual_<method-name>(). [It might be possible in future
to remove the need to do this.]
It may be possible to use similar techniques in C# but this has not been tried.
Callback args """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Python callbacks have args that are more low-level than in the rest of the Python API:
Callbacks generally have a first arg that is a SWIG representation of a MuPDF
fz_context*.
Where the underlying MuPDF function pointer has an arg that is a pointer to an MuPDF struct, unlike elsewhere in the MuPDF bindings we do not translate this into an instance of the corresponding wrapper class. Instead Python callbacks will see a SWIG representation of the low-level C pointer.
It is not safe to construct a Python wrapper class instance directly from
such a SWIG representation of a C pointer, because it will break MuPDF's
reference counting - Python/C++ constructors that take a raw pointer to a
MuPDF struct do not call fz_keep_*() but the corresponding Python/C++
destructor will call fz_drop_*().
It might be safe to create an wrapper class instance using an explicit call
to mupdf.fz_keep_*(), but this has not been tried.
As of 2023-02-03, exceptions from Python callbacks are propagated back through the Python, C++, C, C++ and Python layers. The resulting Python exception will have the original exception text, but the original Python backtrace is lost.
Exceptions in callbacks """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Python exceptions in Director callbacks are propagated back through the language layers (from Python to C++ to C, then back to C++ and finally to Python).
For convenience we add a text representation of the original Python backtrace
to the exception text, but the C layer's fz_try/catch exception handling only
holds 256 characters of exception text, so this backtrace information may be
truncated by the time the exception reaches the original Python code's except ... block.
Example """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Here is an example PDF filter written in Python that removes alternating items:
Details
|expand_begin|
.. code-block::
import mupdf
def test_filter(path):
class MyFilter( mupdf.PdfFilterOptions2):
def __init__( self):
super().__init__()
self.use_virtual_text_filter()
self.recurse = 1
self.sanitize = 1
self.state = 1
self.ascii = True
def text_filter( self, ctx, ucsbuf, ucslen, trm, ctm, bbox):
print( f'text_filter(): ctx={ctx} ucsbuf={ucsbuf} ucslen={ucslen} trm={trm} ctm={ctm} bbox={bbox}')
# Remove every other item.
self.state = 1 - self.state
return self.state
filter_ = MyFilter()
document = mupdf.PdfDocument(path)
for p in range(document.pdf_count_pages()):
page = document.pdf_load_page(p)
print( f'Running document.pdf_filter_page_contents on page {p}')
document.pdf_begin_operation('test filter')
document.pdf_filter_page_contents(page, filter_)
document.pdf_end_operation()
document.pdf_save_document('foo.pdf', mupdf.PdfWriteOptions())
|expand_end|
.. External links