docs/concepts/serialization.md
A document serializer (AKA simply serializer) is a Docling abstraction that is
initialized with a given DoclingDocument and returns a
textual representation for that document.
Besides the document serializer, Docling defines similar abstractions for several document subcomponents, for example: text serializer, table serializer, picture serializer, list serializer, inline serializer, and more.
Last but not least, a serializer provider is a wrapper that abstracts the document serialization strategy from the document instance.
To enable both flexibility for downstream applications and out-of-the-box utility, Docling defines a serialization class hierarchy, providing:
BaseDocSerializer, as well as
BaseTextSerializer, BaseTableSerializer etc, and BaseSerializerProvider, andMarkdownDocSerializer.You can review all methods required to define the above base classes here.
From a client perspective, the most relevant is BaseDocSerializer.serialize(), which
returns the textual representation, as well as relevant metadata on which document
components contributed to that serialization.
DoclingDocument export methodsDocling provides predefined serializers for Markdown, HTML, and DocTags.
The respective DoclingDocument export methods (e.g. export_to_markdown()) are
provided as user shorthands — internally directly instantiating and delegating to
respective serializers.
For an example showcasing how to use serializers, see here.