docs/data_anno_synth/x_anylabeling.en.md
The document parsing capabilities of the PaddleOCR-VL series are now integrated with the X-AnyLabeling annotation platform.
X-AnyLabeling is an industrial-grade all-in-one intelligent annotation platform from CVHub that unifies training, inference, and annotation. Its PaddleOCR panel lets developers run layout parsing, text recognition, formula recognition, table recognition, and seal recognition on images and PDFs with PaddleOCR-VL series models, then review, edit, copy, and export the results.
PaddleOCR-VL is exposed in X-AnyLabeling through two integration modes:
Download the prebuilt binary for your platform from the official release page:
Alternatively, install via pip:
pip install x-anylabeling
After launch, open the PaddleOCR panel from the PaddleOCR icon in the left toolbar, or with the shortcut Ctrl+4.
Once opened, the initial PaddleOCR panel looks like this:
The X-AnyLabeling client supports the PaddleOCR official API out of the box. The first time you open the PaddleOCR panel without API credentials configured, a PPOCR API Settings dialog appears automatically. You can reopen it later via the gear button at the top of the right-hand result panel.
To obtain the API_KEY:
TOKEN (access token).API_KEY field of PPOCR API Settings, and confirm.The API_KEY is used for request authentication, and the official service offers a free quota of tens of thousands of pages per day. The configuration is persisted locally:
${workspace}/xanylabeling_data/paddleocr/api_settings.json
By default, ${workspace} is the user home directory ~; if X-AnyLabeling is launched with --work-dir, that directory is used instead.
The right-hand Parsing Model dropdown currently supports these official API options:
PaddleOCR-VL-1.5 (API)PaddleOCR-VL (API)In the PaddleOCR panel, click + New Parsing at the top of the left panel to import files. Imported files are copied into the local PaddleOCR working directory and added to the parsing queue automatically.
Supported file types:
| Type | Extensions |
|---|---|
| PDF documents | .pdf |
| Images | .bmp, .cif, .gif, .jpeg, .jpg, .png, .tif, .tiff, .webp |
Typical real-world inputs include textbook pages, paper screenshots, receipts, scanned contracts, tabular documents, and government/enterprise materials.
The Parsing Model dropdown on the right lets you switch between versions of the PaddleOCR-VL series:
PaddleOCR-VL-1.5 (API) (recommended): achieves 94.5% accuracy on OmniDocBench v1.5, supports polygon localization for greater robustness in scanned, tilted, curved, screen-captured, and complex-lighting scenarios, and adds seal recognition plus text detection and recognition capabilities.PaddleOCR-VL (API): the initial version.Once a model is selected, X-AnyLabeling starts parsing automatically. The model recognizes and structurizes text, formulas, tables, charts, seals, and other elements.
The colored dot at the lower-left corner of each left-side file item indicates parsing status:
When parsing finishes, X-AnyLabeling shows a side-by-side view: the original document on the left, the model's parsed result on the right. You can review the text, formulas, table structure, chart information, and seal results block by block:
Correct button, to enter edit mode.Copy button for that block.Dedicated editors are provided for different content types:
| Editor | Used for |
|---|---|
| Rich text editor | Plain text, titles, footers, seals, and other non-table/non-formula blocks |
| LaTeX formula editor | display_formula, formula, formula_number, algorithm — edits source with live preview |
| Table editor | table blocks and content recognized as table structure — supports cell editing, row/column add/remove |
In addition to Document Parsing, X-AnyLabeling also offers a JSON view:
Document Parsing view: card-based display of layout blocks, text, formulas, tables, and images — best for human review.JSON view: the full structured result — best for data processing, training-sample construction, evaluation aggregation, and integration with downstream systems.After reviewing and correcting the results, you can export the annotations for downstream use:
In the Document Parsing view, the download button on the right toolbar exports a ZIP containing Markdown and related resources; in the JSON view, it exports the full JSON.
X-AnyLabeling's PaddleOCR panel stores imported files and parsing results in a local working directory:
${workspace}/xanylabeling_data/paddleocr/
├── api_settings.json
├── ui_state.json
├── files/
│ ├── example.pdf
│ ├── image.png
│ ├── __PDF_example/
│ │ ├── page_001.png
│ │ └── page_002.png
│ └── __BLOCK_IMAGES_image.png/
│ └── page_001_block_0001.png
└── jsons/
├── example.pdf.json
└── image.png.json
Deleting a file from the left-hand list also removes its source file, local JSON, PDF preview pages, and block crops.