PP-StructureV3 Pipeline Usage Tutorial

1. Introduction to PP-StructureV3 Pipeline

Layout analysis is a technique used to extract structured information from document images. It is primarily used to convert complex document layouts into machine-readable data formats. This technology has broad applications in document management, information extraction, and data digitization. Layout analysis combines Optical Character Recognition (OCR), image processing, and machine learning algorithms to identify and extract text blocks, titles, paragraphs, images, tables, and other layout elements from documents. This process generally includes three main steps: layout analysis, element analysis, and data formatting. The final result is structured document data, which enhances the efficiency and accuracy of data processing. PP-StructureV3 improves upon the general layout analysis v1 pipeline by enhancing layout region detection, table recognition, and formula recognition. It also adds capabilities such as multi-column reading order recovery, chart understanding, and result conversion to Markdown files. It performs excellently across various document types and can handle complex document data. This pipeline also provides flexible service deployment options, supporting invocation using multiple programming languages on various hardware. In addition, it offers secondary development capabilities, allowing you to train and fine-tune models on your own dataset and integrate the trained models seamlessly.

The PP-StructureV3 pipeline consists of the following seven modules or sub-pipelines. Each module or sub-pipeline can be trained and inferred independently and contains multiple models. For more details, please click the corresponding links to view the documentation.

Layout Detection Module
General OCR Subline
Document Image Preprocessing Subline （Optional）
Table Recognition Subline （Optional）
Seal Text Recognition Subline （Optional）
Formula Recognition Subline （Optional）
Chart Parsing Module (Optional)

In this pipeline, you can choose the model to use based on the benchmark data below.

The inference time only includes the model inference time and does not include the time for pre- or post-processing. In the inference time columns labeled [Standard Mode / High-Performance Mode], [Normal Mode / High-Performance Mode], or [Regular Mode / High-Performance Mode], the Standard Mode, Normal Mode, and Regular Mode values correspond to local Paddle inference engines. Each module selects the appropriate local Paddle inference engine according to the default model name: models that support only dynamic graph use paddle_dynamic, while models that support both static and dynamic graph prefer paddle_static.

<details> <summary>Document Image Orientation Classification Module :</summary> <table> <thead> <tr> <th>Model</th><th>Download Link</th> <th>Top-1 Acc (%)</th> <th>GPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>CPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>PP-LCNet_x1_0_doc_ori</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-LCNet_x1_0_doc_ori_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x1_0_doc_ori_pretrained.pdparams">Pretrained Model</a></td> <td>99.06</td> <td>2.62 / 0.59</td> <td>3.24 / 1.19</td> <td>7</td> <td>Document image classification model based on PP-LCNet_x1_0, supporting four categories: 0°, 90°, 180°, 270°</td> </tr> </tbody> </table> </details> <details> <summary>Text Image Rectification Module:</summary> Text Image Rectification Module (Optional): <table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>CER</th> <th>GPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>UVDoc</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/UVDoc_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UVDoc_pretrained.pdparams">Pretrained Model</a></td> <td>0.179</td> <td>19.05 / 19.05</td> <td>- / 869.82</td> <td>30.3</td> <td>High-precision text image rectification model</td> </tr> </tbody> </table> </details> <details> <summary>Layout Detection Module Model:</summary> * The layout detection model includes 20 common categories: document title, paragraph title, text, page number, abstract, table, references, footnotes, header, footer, algorithm, formula, formula number, image, table, seal, figure_table title, chart, and sidebar text and lists of references <table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>mAP(0.5) (%)</th> <th>GPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Introduction</th> </tr> </thead> <tbody> <tr> <td>PP-DocLayout_plus-L</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout_plus-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout_plus-L_pretrained.pdparams">Training Model</a></td> <td>83.2</td> <td>53.03 / 17.23</td> <td>634.62 / 378.32</td> <td>126.01</td> <td>A higher-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L</td> </tr> <tr> </tbody> </table>

The region detection model includes 1 category: Block:

The layout detection model includes 23 common categories: document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, figure caption, table, table caption, seal, figure title, figure, header image, footer image, and sidebar text

<table> <thead> <tr> <th>Model</th><th>Download Link</th> <th>mAP(0.5) (%)</th> <th>GPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>CPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>PP-DocLayout-L</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-L_pretrained.pdparams">Pretrained Model</a></td> <td>90.4</td> <td>33.59 / 33.59</td> <td>503.01 / 251.08</td> <td>123.76</td> <td>A high-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using RT-DETR-L.</td> </tr> <tr> <td>PP-DocLayout-M</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout-M_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-M_pretrained.pdparams">Pretrained Model</a></td> <td>75.2</td> <td>13.03 / 4.72</td> <td>43.39 / 24.44</td> <td>22.578</td> <td>A layout area localization model with balanced precision and efficiency, trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-L.</td> </tr> <tr> <td>PP-DocLayout-S</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-S_pretrained.pdparams">Pretrained Model</a></td> <td>70.9</td> <td>11.54 / 3.86</td> <td>18.53 / 6.29</td> <td>4.834</td> <td>A high-efficiency layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-S.</td> </tr> </tbody> </table>

❗ The above list includes the 4 core models that are key supported by the text recognition module. The module actually supports a total of 12 full models, including several predefined models with different categories. The complete model list is as follows:

<details><summary> 👉 Details of Model List</summary>

Table Layout Detection Model

3-Class Layout Detection Model, including Table, Image, and Stamp

5-Class English Document Area Detection Model, including Text, Title, Table, Image, and List

17-Class Area Detection Model, including 17 common layout categories: Paragraph Title, Image, Text, Number, Abstract, Content, Figure Caption, Formula, Table, Table Caption, References, Document Title, Footnote, Header, Algorithm, Footer, and Stamp

<table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>mAP(0.5) (%)</th> <th>GPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Introduction</th> </tr> </thead> <tbody> <tr> <td>PicoDet-S_layout_17cls</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PicoDet-S_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_17cls_pretrained.pdparams">Training Model</a></td> <td>87.4</td> <td>8.80 / 3.62</td> <td>17.51 / 6.35</td> <td>4.8</td> <td>A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S.</td> </tr> <tr> <td>PicoDet-L_layout_17cls</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PicoDet-L_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_17cls_pretrained.pdparams">Training Model</a></td> <td>89.0</td> <td>12.60 / 10.27</td> <td>43.70 / 24.42</td> <td>22.6</td> <td>A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L.</td> </tr> <tr> <td>RT-DETR-H_layout_17cls</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/RT-DETR-H_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">Training Model</a></td> <td>98.3</td> <td>115.29 / 101.18</td> <td>964.75 / 964.75</td> <td>470.2</td> <td>A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H.</td> </tr> </table> </details> </details> <details> <summary>Table Structure Recognition Module (Optional):</summary> <table> <tr> <th>Model</th><th>Download Link</th> <th>mAP (%)</th> <th>GPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>CPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>SLANeXt_wired</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/SLANeXt_wired_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/SLANeXt_wired_pretrained.pdparams">Training Model</a></td> <td rowspan="2">69.65</td> <td rowspan="2">85.92 / 85.92</td> <td rowspan="2">- / 501.66</td> <td rowspan="2">351</td> <td rowspan="2">The SLANeXt series is a new generation of table structure recognition models independently developed by the Baidu PaddlePaddle Vision Team. Compared to SLANet and SLANet_plus, SLANeXt focuses on table structure recognition, and trains dedicated weights for wired and wireless tables separately. The recognition ability for all types of tables has been significantly improved, especially for wired tables.</td> </tr> <tr> <td>SLANeXt_wireless</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/SLANeXt_wireless_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/SLANeXt_wireless_pretrained.pdparams">Training Model</a></td> </tr> </table> Table Classification Module Models: <table> <tr> <th>Model</th><th>Model Download Link</th> <th>Top1 Acc (%)</th> <th>GPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> </tr> <tr> <td>PP-LCNet_x1_0_table_cls</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/CLIP_vit_base_patch16_224_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x1_0_table_cls_pretrained.pdparams">Training Model</a></td> <td>94.2</td> <td>2.62 / 0.60</td> <td>3.17 / 1.14</td> <td>6.6</td> </tr> </table> Table Cell Detection Module Models: <table> <tr> <th>Model</th><th>Model Download Link</th> <th>mAP (%)</th> <th>GPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>RT-DETR-L_wired_table_cell_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/RT-DETR-L_wired_table_cell_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-L_wired_table_cell_det_pretrained.pdparams">Training Model</a></td> <td rowspan="2">82.7</td> <td rowspan="2">33.47 / 27.02</td> <td rowspan="2">402.55 / 256.56</td> <td rowspan="2">124</td> <td rowspan="2">RT-DETR is the first real-time end-to-end object detection model. The Baidu PaddlePaddle vision team based RT-DETR-L as the base model, completing pre-training on a self-built table cell detection dataset, achieving good performance in detecting both wired and wireless table cells.</td> </tr> <tr> <td>RT-DETR-L_wireless_table_cell_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/RT-DETR-L_wireless_table_cell_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-L_wireless_table_cell_det_pretrained.pdparams">Training Model</a></td> </tr> </table> </details> <details> <summary>Text Detection Module (Required):</summary> <table> <thead> <tr> <th>Model</th><th>Download Link</th> <th>Detection Hmean (%)</th> <th>GPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>CPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>PP-OCRv5_server_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_server_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_server_det_pretrained.pdparams">Training Model</a></td> <td>83.8</td> <td>89.55 / 70.19</td> <td>383.15 / 383.15</td> <td>84.3</td> <td>PP-OCRv5 server-side text detection model with higher accuracy, suitable for deployment on high-performance servers</td> </tr> <tr> <td>PP-OCRv5_mobile_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_mobile_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_mobile_det_pretrained.pdparams">Training Model</a></td> <td>79.0</td> <td>10.67 / 6.36</td> <td>57.77 / 28.15</td> <td>4.7</td> <td>PP-OCRv5 mobile-side text detection model with higher efficiency, suitable for deployment on edge devices</td> </tr> <tr> <td>PP-OCRv4_server_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_server_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_det_pretrained.pdparams">Training Model</a></td> <td>82.56</td> <td>127.82 / 98.87</td> <td>585.95 / 489.77</td> <td>109</td> <td>The server-side text detection model of PP-OCRv4, with higher accuracy, suitable for deployment on high-performance servers.</td> </tr> <tr> <td>PP-OCRv4_mobile_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_mobile_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_det_pretrained.pdparams">Training Model</a></td> <td>63.8</td> <td>9.87 / 4.17</td> <td>56.60 / 20.79</td> <td>4.7</td> <td>The mobile text detection model of PP-OCRv4, with higher efficiency, suitable for deployment on edge devices.</td> </tr> </tbody> </table> </details> <details> <summary>Text Recognition Module Model (Required):</summary> <details><summary> 👉Full Model List</summary>

PP-OCRv5 Multi-Scenario Models

<table> <tr> <th>Model</th><th>Download Link</th> <th>Chinese Avg Accuracy (%)</th> <th>English Avg Accuracy (%)</th> <th>Traditional Chinese Avg Accuracy (%)</th> <th>Japanese Avg Accuracy (%)</th> <th>GPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>CPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>PP-OCRv5_server_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_server_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_server_rec_pretrained.pdparams">Pretrained Model</a></td> <td>86.38</td> <td>64.70</td> <td>93.29</td> <td>60.35</td> <td>8.46 / 2.36</td> <td>31.21 / 31.21</td> <td>81</td> <td>PP-OCRv5_server_rec is a new-generation text recognition model. It efficiently and accurately supports four major languages: Simplified Chinese, Traditional Chinese, English, and Japanese, as well as handwriting, vertical text, pinyin, and rare characters, offering robust and efficient support for document understanding.</td> </tr> <tr> <td>PP-OCRv5_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>81.29</td> <td>66.00</td> <td>83.55</td> <td>54.65</td> <td>5.43 / 1.46</td> <td>21.20 / 5.32</td> <td>136</td> <td>PP-OCRv5_mobile_rec is a new-generation text recognition model. It efficiently and accurately supports four major languages: Simplified Chinese, Traditional Chinese, English, and Japanese, as well as handwriting, vertical text, pinyin, and rare characters, offering robust and efficient support for document understanding.</td> </tr> </table>

Chinese Recognition Models

<table> <tr> <th>Model</th><th>Download Link</th> <th>Avg Accuracy (%)</th> <th>GPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>CPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>PP-OCRv4_server_rec_doc</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_server_rec_doc_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_doc_pretrained.pdparams">Pretrained Model</a></td> <td>86.58</td> <td>8.69 / 2.78</td> <td>37.93 / 37.93</td> <td>182</td> <td>Based on PP-OCRv4_server_rec, trained on additional Chinese documents and PP-OCR mixed data. It supports over 15,000 characters including Traditional Chinese, Japanese, and special symbols, enhancing both document-specific and general text recognition accuracy.</td> </tr> <tr> <td>PP-OCRv4_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>78.74</td> <td>5.26 / 1.12</td> <td>17.48 / 3.61</td> <td>10.5</td> <td>Lightweight model of PP-OCRv4 with high inference efficiency, suitable for deployment on various edge devices.</td> </tr> <tr> <td>PP-OCRv4_server_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_server_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_pretrained.pdparams">Pretrained Model</a></td> <td>85.19</td> <td>8.75 / 2.49</td> <td>36.93 / 36.93</td> <td>173</td> <td>Server-side model of PP-OCRv4 with high recognition accuracy, suitable for deployment on various servers.</td> </tr> <tr> <td>PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>72.96</td> <td>3.89 / 1.16</td> <td>8.72 / 3.56</td> <td>10.3</td> <td>Lightweight model of PP-OCRv3 with high inference efficiency, suitable for deployment on various edge devices.</td> </tr> </table> <table> <tr> <th>Model</th><th>Download Link</th> <th>Avg Accuracy (%)</th> <th>GPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>CPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>ch_SVTRv2_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/ch_SVTRv2_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_SVTRv2_rec_pretrained.pdparams">Pretrained Model</a></td> <td>68.81</td> <td>10.38 / 8.31</td> <td>66.52 / 30.83</td> <td>80.5</td> <td>SVTRv2 is a server-side recognition model developed by the OpenOCR team at Fudan University’s FVL Lab. It won first place in the OCR End-to-End Recognition task of the PaddleOCR Model Challenge, improving end-to-end accuracy on Benchmark A by 6% compared to PP-OCRv4.</td> </tr> </table> <table> <tr> <th>Model</th><th>Download Link</th> <th>Avg Accuracy (%)</th> <th>GPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>CPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>ch_RepSVTR_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/ch_RepSVTR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_RepSVTR_rec_pretrained.pdparams">Pretrained Model</a></td> <td>65.07</td> <td>6.29 / 1.57</td> <td>20.64 / 5.40</td> <td>48.8</td> <td>RepSVTR is a mobile text recognition model based on SVTRv2. It won first place in the OCR End-to-End Recognition task of the PaddleOCR Model Challenge, improving accuracy on Benchmark B by 2.5% over PP-OCRv4 with comparable inference speed.</td> </tr> </table>

English Recognition Models

Multilingual Recognition Models

<table> <tr> <th>Model</th><th>Model Download Link</th> <th>Recognition Avg Accuracy(%)</th> <th>GPU Inference Time (ms) [Normal / High Performance]</th> <th>CPU Inference Time (ms) [Normal / High Performance]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>korean_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ korean_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/korean_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>60.21</td> <td>3.73 / 0.98</td> <td>8.76 / 2.91</td> <td>9.6</td> <td>An ultra-lightweight Korean text recognition model trained based on PP-OCRv3, supporting Korean and digits recognition</td> </tr> <tr> <td>japan_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ japan_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/japan_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>45.69</td> <td>3.86 / 1.01</td> <td>8.62 / 2.92</td> <td>9.8</td> <td>An ultra-lightweight Japanese text recognition model trained based on PP-OCRv3, supporting Japanese and digits recognition</td> </tr> <tr> <td>chinese_cht_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ chinese_cht_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/chinese_cht_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>82.06</td> <td>3.90 / 1.16</td> <td>9.24 / 3.18</td> <td>10.8</td> <td>An ultra-lightweight Traditional Chinese text recognition model trained based on PP-OCRv3, supporting Traditional Chinese and digits recognition</td> </tr> <tr> <td>te_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ te_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/te_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>95.88</td> <td>3.59 / 0.81</td> <td>8.28 / 6.21</td> <td>8.7</td> <td>An ultra-lightweight Telugu text recognition model trained based on PP-OCRv3, supporting Telugu and digits recognition</td> </tr> <tr> <td>ka_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ ka_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ka_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>96.96</td> <td>3.49 / 0.89</td> <td>8.63 / 2.77</td> <td>17.4</td> <td>An ultra-lightweight Kannada text recognition model trained based on PP-OCRv3, supporting Kannada and digits recognition</td> </tr> <tr> <td>ta_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ ta_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ta_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>76.83</td> <td>3.49 / 0.86</td> <td>8.35 / 3.41</td> <td>8.7</td> <td>An ultra-lightweight Tamil text recognition model trained based on PP-OCRv3, supporting Tamil and digits recognition</td> </tr> <tr> <td>latin_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ latin_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/latin_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>76.93</td> <td>3.53 / 0.78</td> <td>8.50 / 6.83</td> <td>8.7</td> <td>An ultra-lightweight Latin text recognition model trained based on PP-OCRv3, supporting Latin and digits recognition</td> </tr> <tr> <td>arabic_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ arabic_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/arabic_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>73.55</td> <td>3.60 / 0.83</td> <td>8.44 / 4.69</td> <td>17.3</td> <td>An ultra-lightweight Arabic script recognition model trained based on PP-OCRv3, supporting Arabic script and digits recognition</td> </tr> <tr> <td>cyrillic_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ cyrillic_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/cyrillic_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>94.28</td> <td>3.56 / 0.79</td> <td>8.22 / 2.76</td> <td>8.7</td> <td>An ultra-lightweight Cyrillic script recognition model trained based on PP-OCRv3, supporting Cyrillic script and digits recognition</td> </tr> <tr> <td>devanagari_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ devanagari_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/devanagari_PP-OCRv3_mobile_rec_pretrained.pdparams">Pretrained Model</a></td> <td>96.44</td> <td>3.60 / 0.78</td> <td>6.95 / 2.87</td> <td>8.7</td> <td>An ultra-lightweight Devanagari script recognition model trained based on PP-OCRv3, supporting Devanagari script and digits recognition</td> </tr> </table> </details> </details> <details> <summary>Text Line Orientation Classification Module (Optional):</summary> <table> <thead> <tr> <th>Model</th> <th>Model Download Link</th> <th>Top-1 Acc (%)</th> <th>GPU Inference Time (ms) [Normal / High Performance]</th> <th>CPU Inference Time (ms) [Normal / High Performance]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>PP-LCNet_x0_25_textline_ori</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-LCNet_x0_25_textline_ori_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x0_25_textline_ori_pretrained.pdparams">Pretrained Model</a></td> <td>98.85</td> <td>2.16 / 0.41</td> <td>2.37 / 0.73</td> <td>0.96</td> <td>A text line classification model based on PP-LCNet_x0_25, containing two categories: 0 degrees and 180 degrees</td> </tr> </tbody> </table> </details> <details> <summary>Formula Recognition Module (Optional):</summary> <table> <tr> <th>Model</th><th>Model Download Link</th> <th>En-BLEU(%)</th> <th>Zh-BLEU(%)</th> <th>GPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Introduction</th> </tr> <tr> <td>UniMERNet</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/UniMERNet_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UniMERNet_pretrained.pdparams">Training Model</a></td> <td>85.91</td> <td>43.50</td> <td>1311.84 / 1311.84</td> <td>- / 8288.07</td> <td>1530</td> <td>UniMERNet is a formula recognition model developed by Shanghai AI Lab. It uses Donut Swin as the encoder and MBartDecoder as the decoder. The model is trained on a dataset of one million samples, including simple formulas, complex formulas, scanned formulas, and handwritten formulas, significantly improving the recognition accuracy of real-world formulas.</td> </tr> <td>PP-FormulaNet-S</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-FormulaNet-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-S_pretrained.pdparams">Training Model</a></td> <td>87.00</td> <td>45.71</td> <td>182.25 / 182.25</td> <td>- / 254.39</td> <td>224</td> <td rowspan="2">PP-FormulaNet is an advanced formula recognition model developed by the Baidu PaddlePaddle Vision Team. The PP-FormulaNet-S version uses PP-HGNetV2-B4 as its backbone network. Through parallel masking and model distillation techniques, it significantly improves inference speed while maintaining high recognition accuracy, making it suitable for applications requiring fast inference. The PP-FormulaNet-L version, on the other hand, uses Vary_VIT_B as its backbone network and is trained on a large-scale formula dataset, showing significant improvements in recognizing complex formulas compared to PP-FormulaNet-S.</td> </tr> <td>PP-FormulaNet-L</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-FormulaNet-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-L_pretrained.pdparams">Training Model</a></td> <td>90.36</td> <td>45.78</td> <td>1482.03 / 1482.03</td> <td>- / 3131.54</td> <td>695</td> </tr> <td>PP-FormulaNet_plus-S</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-FormulaNet_plus-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet_plus-S_pretrained.pdparams">Training Model</a></td> <td>88.71</td> <td>53.32</td> <td>179.20 / 179.20</td> <td>- / 260.99</td> <td>248</td> <td rowspan="3">PP-FormulaNet_plus is an enhanced version of the formula recognition model developed by the Baidu PaddlePaddle Vision Team, building upon the original PP-FormulaNet. Compared to the original version, PP-FormulaNet_plus utilizes a more diverse formula dataset during training, including sources such as Chinese dissertations, professional books, textbooks, exam papers, and mathematics journals. This expansion significantly improves the model’s recognition capabilities. Among the models, PP-FormulaNet_plus-M and PP-FormulaNet_plus-L have added support for Chinese formulas and increased the maximum number of predicted tokens for formulas from 1,024 to 2,560, greatly enhancing the recognition performance for complex formulas. Meanwhile, the PP-FormulaNet_plus-S model focuses on improving the recognition of English formulas. With these improvements, the PP-FormulaNet_plus series models perform exceptionally well in handling complex and diverse formula recognition tasks. </td> </tr> <tr> <td>PP-FormulaNet_plus-M</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-FormulaNet_plus-M_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet_plus-M_pretrained.pdparams">Training Model</a></td> <td>91.45</td> <td>89.76</td> <td>1040.27 / 1040.27</td> <td>- / 1615.80</td> <td>592</td> </tr> <tr> <td>PP-FormulaNet_plus-L</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-FormulaNet_plus-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet_plus-L_pretrained.pdparams">Training Model</a></td> <td>92.22</td> <td>90.64</td> <td>1476.07 / 1476.07</td> <td>- / 3125.58</td> <td>698</td> </tr> <tr> <td>LaTeX_OCR_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/LaTeX_OCR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">Training Model</a></td> <td>74.55</td> <td>39.96</td> <td>1088.89 / 1088.89</td> <td>- / -</td> <td>99</td> <td>LaTeX-OCR is a formula recognition algorithm based on an autoregressive large model. It uses Hybrid ViT as the backbone network and a transformer as the decoder, significantly improving the accuracy of formula recognition.</td> </tr> </table> </details> <details> <summary>Seal Text Detection Module (Optional):</summary> <table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>Detection Hmean (%)</th> <th>GPU Inference Time (ms) [Normal / High Performance]</th> <th>CPU Inference Time (ms) [Normal / High Performance]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>PP-OCRv4_server_seal_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_server_seal_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_seal_det_pretrained.pdparams">Pretrained Model</a></td> <td>98.40</td> <td>124.64 / 91.57</td> <td>545.68 / 439.86</td> <td>109</td> <td>Server-side seal text detection model based on PP-OCRv4, offering higher accuracy and suitable for deployment on high-performance servers</td> </tr> <tr> <td>PP-OCRv4_mobile_seal_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_mobile_seal_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_seal_det_pretrained.pdparams">Pretrained Model</a></td> <td>96.36</td> <td>9.70 / 3.56</td> <td>50.38 / 19.64</td> <td>4.7</td> <td>Mobile-side seal text detection model based on PP-OCRv4, offering higher efficiency and suitable for edge-side deployment</td> </tr> </tbody> </table> </details> </details> <details> <summary>Chart Parsing Module: </summary> <table> <tr> <th>Model</th><th>Model Download Link</th> <th>Model parameter size（B）</th> <th>Model Storage Size (GB)</th> <th>Model Score </th> <th>Description</th> </tr> <tr> <td>PP-Chart2Table</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-Chart2Table_infer.tar">Inference Model</a></td> <td>0.58</td> <td>1.4</td> <th>75.98</th> <td>PP-Chart2Table is a self-developed multimodal model by the PaddlePaddle team, focusing on chart parsing, demonstrating outstanding performance in both Chinese and English chart parsing tasks. The team adopted a carefully designed data generation strategy, constructing a high-quality multimodal dataset of nearly 700,000 entries covering common chart types like pie charts, bar charts, stacked area charts, and various application scenarios. They also designed a two-stage training method, utilizing large model distillation to fully leverage massive unlabeled OOD data. In internal business tests in both Chinese and English scenarios, PP-Chart2Table not only achieved the SOTA level among models of the same parameter scale but also reached accuracy comparable to 7B parameter scale VLM models in critical scenarios.</td> </tr> </table> </details> <details> <summary>Test Environment Description:</summary> <ul> <li>Performance Test Environment <ul> <li>Test Dataset: <ul> <li>Document Image Orientation Classification Module: A self-built dataset using PaddleX, covering multiple scenarios such as ID cards and documents, containing 1000 images.</li> <li>Text Image Rectification Model: <a href="https://www3.cs.stonybrook.edu/~cvl/docunet.html">DocUNet</a></li> <li>Layout Region Detection Model: A self-built layout detection dataset using PaddleOCR, containing 10,000 images of common document types such as Chinese and English papers, magazines, and research reports.</li> <li>Table Structure Recognition Model: A self-built English table recognition dataset using PaddleX.</li> <li>Text Detection Model: A self-built Chinese dataset using PaddleOCR, covering multiple scenarios such as street scenes, web images, documents, and handwriting, with 500 images for detection.</li> <li>Chinese Recognition Model: A self-built Chinese dataset using PaddleOCR, covering multiple scenarios such as street scenes, web images, documents, and handwriting, with 11,000 images for text recognition.</li> <li>ch_SVTRv2_rec: Evaluation set A for "OCR End-to-End Recognition Task" in the <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR Algorithm Model Challenge</a></li> <li>ch_RepSVTR_rec: Evaluation set B for "OCR End-to-End Recognition Task" in the <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR Algorithm Model Challenge</a>.</li> <li>English Recognition Model: A self-built English dataset using PaddleX.</li> <li>Multilingual Recognition Model: A self-built multilingual dataset using PaddleX.</li> <li>Text Line Orientation Classification Model: A self-built dataset using PaddleX, covering various scenarios such as ID cards and documents, containing 1000 images.</li> <li>Seal Text Detection Model: A self-built dataset using PaddleX, containing 500 images of circular seal textures.</li> </ul> </li> <li>Hardware Configuration: <ul> <li>GPU: NVIDIA Tesla T4</li> <li>CPU: Intel Xeon Gold 6271C @ 2.60GHz</li> </ul> </li> <li>Software Environment: <ul> <li>Ubuntu 20.04 / CUDA 11.8 / cuDNN 8.9 / TensorRT 8.6.1.6</li> <li>paddlepaddle-gpu 3.0.0 / paddleocr 3.0.3</li> </ul> </li> </ul> </li> <li>Inference Mode Description</li> </ul> <table border="1"> <thead> <tr> <th>Mode</th> <th>GPU Configuration </th> <th>CPU Configuration </th> <th>Acceleration Technology Combination</th> </tr> </thead> <tbody> <tr> <td>Normal Mode</td> <td>FP32 Precision / No TRT Acceleration</td> <td>FP32 Precision / 8 Threads</td> <td>Local Paddle inference engines (by default, the engine is selected according to the default model name; if both static and dynamic graph are available, <code>paddle_static</code> is preferred)</td> </tr> <tr> <td>High-Performance Mode</td> <td>Optimal combination of pre-selected precision types and acceleration strategies</td> <td>FP32 Precision / 8 Threads</td> <td>Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.)</td> </tr> </tbody> </table> </details>

2. Quick Start

Before using the PP-StructureV3 pipeline locally, please make sure you have completed the installation of the wheel package according to the installation guide. If you prefer to install dependencies selectively, please refer to the relevant instructions in the installation documentation. The corresponding dependency group for this pipeline is <code>doc-parser</code>. After installation, you can use it via command line or Python integration.

Please note: If you encounter issues such as the program becoming unresponsive, unexpected program termination, running out of memory resources, or extremely slow inference during execution, please try adjusting the configuration according to the documentation, such as disabling unnecessary features or using lighter-weight models.

2.1 Command Line Usage

Use a single command to quickly experience the PP-StructureV3 pipeline:

bash

paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png

# Use --use_doc_orientation_classify to enable document orientation classification
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_doc_orientation_classify True

# Use --use_doc_unwarping to enable document unwarping module
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_doc_unwarping True

# Use --use_textline_orientation to enable text line orientation classification
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_textline_orientation False

# Use --device to specify GPU for inference
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --device gpu

The examples above use local Paddle inference engines by default. By default, each module selects the appropriate local Paddle inference engine according to the default model name: models that support only dynamic graph use paddle_dynamic, while models that support both static and dynamic graph prefer paddle_static. To run them, first install PaddlePaddle by following PaddlePaddle Framework Installation.

If you choose transformers as the inference engine, make sure the Transformers environment is configured by following Inference Engine and Configuration, and then run the following command:

bash

# Use the transformers engine for inference
# Some models are still being supported. For inference, please disable formula recognition and replace the wireless table structure recognition model using the following command:
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png \
    --engine transformers --use_formula_recognition False --wireless_table_structure_recognition_model_name SLANeXt_wireless

<details><summary>Command line supports more parameters. Click to expand for detailed parameter descriptions</summary> <table> <thead> <tr> <th>Parameter</th> <th>Description</th> <th>Type</th> <th>Default</th> </tr> </thead> <tbody> <tr> <td><code>input</code></td> <td>Meaning:Data to be predicted. Required.

Description: .e.g., local path to image or PDF file: <code>/root/data/img.jpg</code>; URL, e.g., online image or PDF: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/demo_paper.png">example</a>; local directory: directory containing images to predict, e.g., <code>/root/data/</code> (currently, directories with PDFs are not supported; PDFs must be specified by file path).

</td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>save_path</code></td> <td>Meaning:Path to save inference results.

Description: If not set, results will not be saved locally.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>layout_detection_model_name</code></td> <td>Meaning:Name of the layout detection model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>layout_detection_model_dir</code></td> <td>Meaning:Directory path of the layout detection model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>layout_threshold</code></td> <td>Meaning:Score threshold for the layout model.

Description: Any value between <code>0-1</code>. If not set, the default value is used, which is <code>0.5</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>layout_nms</code></td> <td>Meaning:Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection.

Description: If not set, the parameter will default to the value initialized in the pipeline, which is set to <code>True</code> by default.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>layout_unclip_ratio</code></td> <td>Meaning:Unclip ratio for detected boxes in layout detection model.

Description: Any float > <code>0</code>. If not set, the default is <code>1.0</code>.

<td><code>float</code></td> <td></td> </tr> <tr> <td><code>layout_merge_bboxes_mode</code></td> <td>Meaning:The merging mode for the detection boxes output by the model in layout detection.

Description:

<ul> <li>large: When set to "large", only the largest outer bounding box will be retained for overlapping bounding boxes, and the inner overlapping boxes will be removed;</li> <li>small: When set to "small", only the smallest inner bounding boxes will be retained for overlapping bounding boxes, and the outer overlapping boxes will be removed;</li> <li>union: No filtering of bounding boxes will be performed, and both inner and outer boxes will be retained;</li> </ul>If not set, the default is <code>large</code>. </td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>chart_recognition_model_name</code></td> <td>Meaning:Name of the chart parsing model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>chart_recognition_model_dir</code></td> <td>Meaning:Directory path of the chart parsing model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>chart_recognition_batch_size</code></td> <td>Meaning:Batch size for the chart parsing model.

Description: If not set, the default batch size is <code>1</code>.</td>

<td><code>int</code></td> <td></td> </tr> <tr> <td><code>region_detection_model_name</code></td> <td>Meaning:Name of the region detection model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>region_detection_model_dir</code></td> <td>Meaning:Directory path of the region detection model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_orientation_classify_model_name</code></td> <td>Meaning:Name of the document orientation classification model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_orientation_classify_model_dir</code></td> <td>Meaning:Directory path of the document orientation classification model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_unwarping_model_name</code></td> <td>Meaning:Name of the document unwarping model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_unwarping_model_dir</code></td> <td>Meaning:Directory path of the document unwarping model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_detection_model_name</code></td> <td>Meaning:Name of the text detection model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_detection_model_dir</code></td> <td>Meaning:Directory path of the text detection model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_det_limit_side_len</code></td> <td>Meaning:Image side length limitation for text detection.

Description: Any integer > <code>0</code>. If not set, the default value will be <code>960</code>.

</td> <td><code>int</code></td> <td></td> </tr> <tr> <td><code>text_det_limit_type</code></td> <td>Meaning:Type of the image side length limit for text detection.

Description: Supports <code>min</code> and <code>max</code>; <code>min</code> means ensuring the shortest side of the image is not less than <code>det_limit_side_len</code>, <code>max</code> means the longest side does not exceed <code>limit_side_len</code>. If not set, the default value will be <code>max</code>.

</td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_det_thresh</code></td> <td>Meaning:Pixel threshold for text detection. Pixels with scores above this value in the probability map are considered text.

Description: Any float > <code>0</code>. If not set, the default is <code>0.3</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>text_det_box_thresh</code></td> <td>Meaning:Box threshold for text detection. A bounding box is considered text if the average score of pixels inside is greater than this value.

Description: Any float > <code>0</code>. If not set, the default is <code>0.6</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>text_det_unclip_ratio</code></td> <td>Meaning:Expansion ratio for text detection. The higher the value, the larger the expansion area.

Description: Any float > <code>0</code>. If not set, the default is <code>2.0</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>textline_orientation_model_name</code></td> <td>Meaning:Name of the text line orientation model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>textline_orientation_model_dir</code></td> <td>Meaning:Directory path of the text line orientation model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>textline_orientation_batch_size</code></td> <td>Meaning:Batch size for the text line orientation model.

Description: If not set, the default is <code>1</code>.</td>

<td><code>int</code></td> <td></td> </tr> <tr> <td><code>text_recognition_model_name</code></td> <td>Meaning:Name of the text recognition model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_recognition_model_dir</code></td> <td>Meaning:Directory of the text recognition model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_recognition_batch_size</code></td> <td>Meaning:Batch size for text recognition.

Description: If not set, the default is <code>1</code>.</td>

<td><code>int</code></td> <td></td> </tr> <tr> <td><code>text_rec_score_thresh</code></td> <td>Meaning:Score threshold for text recognition. Only results above this value will be kept.

Description: Any float > <code>0</code>. If not set, the default is <code>0.0</code> (no threshold).

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>table_classification_model_name</code></td> <td>Meaning:Name of the table classification model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>table_classification_model_dir</code></td> <td>Meaning:Directory of the table classification model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wired_table_structure_recognition_model_name</code></td> <td>Meaning:Name of the wired table structure recognition model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wired_table_structure_recognition_model_dir</code></td> <td>Meaning:Directory of the wired table structure recognition model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wireless_table_structure_recognition_model_name</code></td> <td>Meaning:Name of the wireless table structure recognition model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wireless_table_structure_recognition_model_dir</code></td> <td>Meaning:Directory of the wireless table structure recognition model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wired_table_cells_detection_model_name</code></td> <td>Meaning:Name of the wired table cell detection model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wired_table_cells_detection_model_dir</code></td> <td>Meaning:Directory of the wired table cell detection model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wireless_table_cells_detection_model_name</code></td> <td>Meaning:Name of the wireless table cell detection model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wireless_table_cells_detection_model_dir</code></td> <td>Meaning:Directory of the wireless table cell detection model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>table_orientation_classify_model_name</code></td> <td>Meaning:Name of the wireless table orientation classification model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>table_orientation_classify_model_dir</code></td> <td>Meaning:Directory of the table orientation classification model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_detection_model_name</code></td> <td>Meaning:Name of the seal text detection model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_detection_model_dir</code></td> <td>Meaning:Directory of the seal text detection model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_det_limit_side_len</code></td> <td>Meaning:Image side length limit for seal text detection.

Description: Any integer > <code>0</code>. If not set, the default is <code>736</code>.

</td> <td><code>int</code></td> <td></td> </tr> <tr> <td><code>seal_det_limit_type</code></td> <td>Meaning:Limit type for image side in seal text detection.

Description: Supports <code>min</code> and <code>max</code>; <code>min</code> ensures shortest side ≥ <code>det_limit_side_len</code>, <code>max</code> ensures longest side ≤ <code>limit_side_len</code>. If not set, the default is <code>min</code>.

</td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_det_thresh</code></td> <td>Meaning:Pixel threshold. Pixels with scores above this value in the probability map are considered text.

Description: Any float > <code>0</code>. If not set, the default is <code>0.2</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>seal_det_box_thresh</code></td> <td>Meaning:Box threshold. Boxes with average pixel scores above this value are considered text regions.

Description: Any float > <code>0</code>. If not set, the default is <code>0.6</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>seal_det_unclip_ratio</code></td> <td>Meaning:Expansion ratio for seal text detection. Higher value means larger expansion area.

Description: Any float > <code>0</code>. If not set, the default is <code>0.5</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>seal_text_recognition_model_name</code></td> <td>Meaning:Name of the seal text recognition model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_recognition_model_dir</code></td> <td>Meaning:Directory of the seal text recognition model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_recognition_batch_size</code></td> <td>Meaning:Batch size for seal text recognition.

Description: If not set, the default is <code>1</code>.</td>

<td><code>int</code></td> <td></td> </tr> <tr> <td><code>seal_rec_score_thresh</code></td> <td>Meaning:Recognition score threshold. Text results above this value will be kept.

Description: Any float > <code>0</code>. If not set, the default is <code>0.0</code> (no threshold).

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>formula_recognition_model_name</code></td> <td>Meaning:Name of the formula recognition model.

Description: If not set, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>formula_recognition_model_dir</code></td> <td>Meaning:Directory of the formula recognition model.

Description: If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>formula_recognition_batch_size</code></td> <td>Meaning:Batch size of the formula recognition model.

Description: If not set, the default is <code>1</code>.</td>

<td><code>int</code></td> <td></td> </tr> <tr> <td><code>use_doc_orientation_classify</code></td> <td>Meaning:Whether to load and use the document orientation classification module.

Description: If not set, the default is <code>False</code>.</td>

<td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_unwarping</code></td> <td>Meaning:Whether to load and use the document unwarping module.

Description: If not set, the default is <code>False</code>.</td>

<td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_textline_orientation</code></td> <td>Meaning:Whether to load and use the text line orientation classification module.

Description: If not set, the default is <code>False</code>.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_seal_recognition</code></td> <td>Meaning:Whether to load and use seal text recognition subpipeline.

Description: If not set, the default is <code>False</code>.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_table_recognition</code></td> <td>Meaning:Whether to load and use table recognition subpipeline.

Description: If not set, the default is <code>True</code>.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_formula_recognition</code></td> <td>Meaning:Whether to load and use formula recognition subpipeline.

Description: If not set, the default is <code>True</code>.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_chart_recognition</code></td> <td>Meaning:Whether to load and use the chart parsing module.

Description: If not set, the default is <code>False</code>.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_region_detection</code></td> <td>Meaning:Whether to load and use the document region detection module.

Description: If not set, the default is <code>True</code>.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>format_block_content</code></td> <td>Meaning:Whether to format the content in <code>block_content</code> as Markdown.

Description: If not set, the initialized default value will be used, which is <code>False</code> by default.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>markdown_ignore_labels</code></td> <td>Meaning:Layout tags that need to be ignored in Markdown.

Description: If not set, the initialized default value will be used, which is <code>['number','footnote','header','header_image','footer','footer_image','aside_text']</code> by default.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>device</code></td> <td>Meaning:Device for inference.

Description: You can specify a device ID:

<ul> <li>CPU: e.g., <code>cpu</code> means using CPU for inference;</li> <li>GPU: e.g., <code>gpu:0</code> means GPU 0</li> <li>NPU: e.g., <code>npu:0</code> means NPU 0</li> <li>XPU: e.g., <code>xpu:0</code> means XPU 0</li> <li>MLU: e.g., <code>mlu:0</code> means MLU 0</li> <li>DCU: e.g., <code>dcu:0</code> means DCU 0</li> <li>MetaX GPU: e.g., <code>metax_gpu:0</code> means MetaX GPU 0</li> <li>Iluvatar GPU: e.g., <code>iluvatar_gpu:0</code> means Iluvatar GPU 0</li> </ul>If not set, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used. </td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>engine</code></td> <td>Meaning: Inference engine. Description: Supports <code>None</code> (the default), <code>paddle</code>, <code>paddle_static</code>, <code>paddle_dynamic</code>, and <code>transformers</code>. When left as <code>None</code>, PaddleOCR preserves the behavior of earlier versions, which in most configurations is equivalent to <code>paddle</code>. For detailed descriptions, supported values, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>enable_hpi</code></td> <td>Meaning: Whether to enable high-performance inference.</td> <td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_tensorrt</code></td> <td>Meaning: Whether to enable the TensorRT subgraph engine of Paddle Inference.

Description: If the model does not support TensorRT acceleration, acceleration will not be used even if this flag is set.

For CUDA 11.8 versions of PaddlePaddle, the compatible TensorRT version is 8.x (x>=6). TensorRT 8.6.1.6 is recommended.

</td> <td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>precision</code></td> <td>Meaning: Computation precision, such as <code>fp32</code> or <code>fp16</code>.</td> <td><code>str</code></td> <td><code>fp32</code></td> </tr> <tr> <td><code>enable_mkldnn</code></td> <td>Meaning: Whether to enable MKL-DNN accelerated inference.

Description: If MKL-DNN is unavailable or the model does not support MKL-DNN acceleration, acceleration will not be used even if this flag is set.

The inference result will be printed in the terminal. The default output of the PP-StructureV3 pipeline is as follows:

<details><summary> 👉Click to expand</summary> <pre> <code> {'res': {'input_path': 'pp_structure_v3_demo.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_general_ocr': True, 'use_seal_recognition': True, 'use_table_recognition': True, 'use_formula_recognition': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9853514432907104, 'coordinate': [770.9531, 776.6814, 1122.6057, 1058.7322]}, {'cls_id': 1, 'label': 'image', 'score': 0.9848673939704895, 'coordinate': [775.7434, 202.27979, 1502.8113, 686.02136]}, {'cls_id': 2, 'label': 'text', 'score': 0.983731746673584, 'coordinate': [1152.3197, 1113.3275, 1503.3029, 1346.586]}, {'cls_id': 2, 'label': 'text', 'score': 0.9832221865653992, 'coordinate': [1152.5602, 801.431, 1503.8436, 986.3563]}, {'cls_id': 2, 'label': 'text', 'score': 0.9829439520835876, 'coordinate': [9.549545, 849.5713, 359.1173, 1058.7488]}, {'cls_id': 2, 'label': 'text', 'score': 0.9811657667160034, 'coordinate': [389.58298, 1137.2659, 740.66235, 1346.7488]}, {'cls_id': 2, 'label': 'text', 'score': 0.9775941371917725, 'coordinate': [9.1302185, 201.85, 359.0409, 339.05692]}, {'cls_id': 2, 'label': 'text', 'score': 0.9750366806983948, 'coordinate': [389.71454, 752.96924, 740.544, 889.92456]}, {'cls_id': 2, 'label': 'text', 'score': 0.9738152027130127, 'coordinate': [389.94565, 298.55988, 740.5585, 435.5124]}, {'cls_id': 2, 'label': 'text', 'score': 0.9737328290939331, 'coordinate': [771.50256, 1065.4697, 1122.2582, 1178.7324]}, {'cls_id': 2, 'label': 'text', 'score': 0.9728517532348633, 'coordinate': [1152.5154, 993.3312, 1503.2349, 1106.327]}, {'cls_id': 2, 'label': 'text', 'score': 0.9725610017776489, 'coordinate': [9.372787, 1185.823, 359.31738, 1298.7227]}, {'cls_id': 2, 'label': 'text', 'score': 0.9724331498146057, 'coordinate': [389.62848, 610.7389, 740.83234, 746.2377]}, {'cls_id': 2, 'label': 'text', 'score': 0.9720287322998047, 'coordinate': [389.29898, 897.0936, 741.41516, 1034.6616]}, {'cls_id': 2, 'label': 'text', 'score': 0.9713053703308105, 'coordinate': [10.323685, 1065.4663, 359.6786, 1178.8872]}, {'cls_id': 2, 'label': 'text', 'score': 0.9689728021621704, 'coordinate': [9.336395, 537.6609, 359.2901, 652.1881]}, {'cls_id': 2, 'label': 'text', 'score': 0.9684857130050659, 'coordinate': [10.7608185, 345.95068, 358.93616, 434.64087]}, {'cls_id': 2, 'label': 'text', 'score': 0.9681928753852844, 'coordinate': [9.674866, 658.89075, 359.56528, 770.4319]}, {'cls_id': 2, 'label': 'text', 'score': 0.9634978175163269, 'coordinate': [770.9464, 1281.1785, 1122.6522, 1346.7156]}, {'cls_id': 2, 'label': 'text', 'score': 0.96304851770401, 'coordinate': [390.0113, 201.28055, 740.1684, 291.53073]}, {'cls_id': 2, 'label': 'text', 'score': 0.962053120136261, 'coordinate': [391.21393, 1040.952, 740.5046, 1130.32]}, {'cls_id': 2, 'label': 'text', 'score': 0.9565253853797913, 'coordinate': [10.113251, 777.1482, 359.439, 842.437]}, {'cls_id': 2, 'label': 'text', 'score': 0.9497362375259399, 'coordinate': [390.31357, 537.86285, 740.47595, 603.9285]}, {'cls_id': 2, 'label': 'text', 'score': 0.9371236562728882, 'coordinate': [10.2034, 1305.9753, 359.5958, 1346.7295]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9338151216506958, 'coordinate': [791.6062, 1200.8479, 1103.3257, 1259.9324]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9326773285865784, 'coordinate': [408.0737, 457.37024, 718.9509, 516.63464]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9274250864982605, 'coordinate': [29.448685, 456.6762, 340.99194, 515.6999]}, {'cls_id': 2, 'label': 'text', 'score': 0.8742568492889404, 'coordinate': [1154.7095, 777.3624, 1330.3086, 794.5853]}, {'cls_id': 2, 'label': 'text', 'score': 0.8442489504814148, 'coordinate': [586.49316, 160.15454, 927.468, 179.64203]}, {'cls_id': 11, 'label': 'doc_title', 'score': 0.8332607746124268, 'coordinate': [133.80017, 37.41908, 1380.8601, 124.1429]}, {'cls_id': 6, 'label': 'figure_title', 'score': 0.6770150661468506, 'coordinate': [812.1718, 705.1199, 1484.6973, 747.1692]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[133, 35], ..., [133, 131]],

   ...,

   [[ 13, 754],
    ...,
    [ 13, 777]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['助力双方交往', '搭建友谊桥梁', '本报记者', '沈小晓', '任', '彦', '黄培昭', '身着中国传统民族服装的厄立特里亚青', '厄立特里亚高等教育与研究院合作建立，开', '年依次登台表演中国民族舞、现代舞、扇子舞', '设了中国语言课程和中国文化课程，注册学', '等,曼妙的舞姿赢得现场观众阵阵掌声。这', '生2万余人次。10余年来，厄特孔院已成为', '是日前厄立特里亚高等教育与研究院孔子学', '当地民众了解中国的一扇窗口。', '院(以下简称"厄特孔院")举办"喜迎新年"中国', '黄鸣飞表示,随着来学习中文的人日益', '歌舞比赛的场景。', '增多，阿斯马拉大学教学点已难以满足教学', '中国和厄立特里亚传统友谊深厚。近年', '需要。2024年4月，由中企蜀道集团所属四', '来,在高质量共建"一带一路"框架下，中厄两', '川路桥承建的孔院教学楼项目在阿斯马拉开', '国人文交流不断深化，互利合作的民意基础', '工建设,预计今年上半年竣工,建成后将为厄', '日益深厚。', '特孔院提供全新的办学场地。', '“学好中文，我们的', '“在中国学习的经历', '未来不是梦”', '让我看到更广阔的世界”', '多年来,厄立特里亚广大赴华留学生和', '培训人员积极投身国家建设,成为助力该国', '发展的人才和厄中友好的见证者和推动者。', '在厄立特里亚全国妇女联盟工作的约翰', '娜·特韦尔德·凯莱塔就是其中一位。她曾在', '中华女子学院攻读硕士学位,研究方向是女', '性领导力与社会发展。其间，她实地走访中国', '多个地区，获得了观察中国社会发展的第一', '在厄立特里亚不久前举办的第六届中国风筝文化节上，当地小学生体验风筝制作。', '手资料。', '中国驻厄立特里亚大使馆供图', '“这是中文歌曲初级班，共有32人。学', '“不管远近都是客人，请不用客气;相约', '瓦的北红海省博物馆。', '生大部分来自首都阿斯马拉的中小学，年龄', '好了在一起,我们欢迎你"在一场中厄青', '博物馆二层陈列着一个发掘自阿杜利', '最小的仅有6岁。"尤斯拉告诉记者。', '年联谊活动上,四川路桥中方员工同当地大', '斯古城的中国古代陶制酒器,罐身上写着', '尤斯拉今年23岁，是厄立特里亚一所公立', '学生合唱《北京欢迎你》。厄立特里亚技术学', '“万”“和""禅”“山"等汉字。“这件文物证', '学校的艺术老师。她12岁开始在厄特孔院学', '院计算机科学与工程专业学生鲁夫塔·谢拉', '明,很早以前我们就通过海上丝绸之路进行', '习中文,在2017年第十届"汉语桥"世界中学生', '是其中一名演唱者,她很早便在孔院学习中', '贸易往来与文化交流。这也是厄立特里亚', '中文比赛中获得厄立特里亚赛区第一名,并和', '文，一直在为去中国留学作准备。“这句歌词', '与中国友好交往历史的有力证明。"北红海', '同伴代表厄立特里亚前往中国参加决赛,获得', '是我们两国人民友谊的生动写照。无论是投', '省博物馆研究与文献部负责人伊萨亚斯·特', '团体优胜奖。2022年起，尤斯拉开始在厄特孔', '身于厄立特里亚基础设施建设的中企员工，', '斯法兹吉说。', '院兼职教授中文歌曲,每周末两个课时。中国', '还是在中国留学的厄立特里亚学子,两国人', '厄立特里亚国家博物馆考古学和人类学', '文化博大精深,我希望我的学生们能够通过中', '民携手努力,必将推动两国关系不断向前发', '研究员菲尔蒙·特韦尔德十分喜爱中国文', '文歌曲更好地理解中国文化。"她说。', '穆卢盖塔密切关注中国在经济、科技、教', '展。"鲁夫塔说。', '化。他表示：“学习彼此的语言和文化，将帮', '“姐姐,你想去中国吗?"“非常想！我想', '育等领域的发展，“中国在科研等方面的实力', '厄立特里亚高等教育委员会主任助理萨', '助厄中两国人民更好地理解彼此，助力双方', '去看故宫、爬长城。"尤斯拉的学生中有一对', '与日俱增。在中国学习的经历让我看到更广', '马瑞表示：“每年我们都会组织学生到中国访', '交往,搭建友谊桥梁。"', '能歌善舞的姐妹,姐姐露娅今年15岁，妹妹', '阔的世界，从中受益匪浅。', '问学习，目前有超过5000名厄立特里亚学生', '厄立特里亚国家博物馆馆长塔吉丁·努', '莉娅14岁，两人都已在厄特孔院学习多年，', '23岁的莉迪亚·埃斯蒂法诺斯已在厄特', '在中国留学。学习中国的教育经验,有助于', '里达姆·优素福曾多次访问中国，对中华文明', '中文说得格外流利。', '孔院学习3年，在中国书法、中国画等方面表', '提升厄立特里亚的教育水平。”', '的传承与创新、现代化博物馆的建设与发展', '露娅对记者说：“这些年来,怀着对中文', '现十分优秀，在2024年厄立特里亚赛区的', '“共同向世界展示非', '印象深刻。“中国博物馆不仅有许多保存完好', '和中国文化的热爱,我们姐妹俩始终相互鼓', '“汉语桥"比赛中获得一等奖。莉迪亚说：“学', '的文物,还充分运用先进科技手段进行展示，', '励,一起学习。我们的中文一天比一天好,还', '习中国书法让我的内心变得安宁和纯粹。我', '洲和亚洲的灿烂文明”', '帮助人们更好理解中华文明。"塔吉丁说，厄', '学会了中文歌和中国舞。我们一定要到中国', '也喜欢中国的服饰,希望未来能去中国学习，', '立特里亚与中国都拥有悠久的文明,始终相', '去。学好中文,我们的未来不是梦!"', '把中国不同民族元素融入服装设计中，创作', '从阿斯马拉出发,沿着蜿蜒曲折的盘山', '互理解、相互尊重。我希望未来与中国同行', '据厄特孔院中方院长黄鸣飞介绍,这所', '出更多精美作品，也把厄特文化分享给更多', '公路一路向东寻找丝路印迹。驱车两个小', '加强合作,共同向世界展示非洲和亚洲的灿', '孔院成立于2013年3月，由贵州财经大学和', '的中国朋友。”', '时,记者来到位于厄立特里亚港口城市马萨', '烂文明。”', '谈起在中国求学的经历,约翰娜记忆犹', '新：“中国的发展在当今世界是独一无二的。', '沿着中国特色社会主义道路坚定前行，中国', '创造了发展奇迹,这一切都离不开中国共产党', '的领导。中国的发展经验值得许多国家学习', '借鉴，”', '正在西南大学学习的厄立特里亚博士生', '穆卢盖塔·泽穆伊对中国怀有深厚感情。8', '年前，在北京师范大学获得硕士学位后，穆卢', '盖塔在社交媒体上写下这样一段话：“这是我', '人生的重要一步，自此我拥有了一双坚固的', '鞋子.赋予我穿越荆棘的力量。”', '“鲜花曾告诉我你怎样走过，大地知道你', '心中的每一个角落"厄立特里亚阿斯马拉', '大学综合楼二层，一阵优美的歌声在走廊里回', '响。循着熟悉的旋律轻轻推开一间教室的门，', '学生们正跟着老师学唱中文歌曲《同一首歌》。', '这是厄特孔院阿斯马拉大学教学点的一', '节中文歌曲课。为了让学生们更好地理解歌', '词大意，老师尤斯拉·穆罕默德萨尔·侯赛因逐', '字翻译和解释歌词。随着伴奏声响起，学生们', '边唱边随着节拍摇动身体，现场气氛热烈。'], 'rec_scores': array([0.99972075, ..., 0.96241361]), 'rec_polys': array([[[133,  35],
    ...,
    [133, 131]],

   ...,

   [[ 13, 754],
    ...,
    [ 13, 777]]], dtype=int16), 'rec_boxes': array([[133, ..., 131],
   ...,
   [ 13, ..., 777]], dtype=int16)}}}

</code></pre></details>

For explanation of the result parameters, refer to 2.2 Python Script Integration.

Note: Due to the large size of the default model in the pipeline, the inference speed may be slow. You can refer to the model list in Section 1 to replace it with a faster model.

2.2 Python Script Integration

The command line method is for quick testing and visualization. In actual projects, you usually need to integrate the model via code. You can perform pipeline inference with just a few lines of code as shown below:

python

from paddleocr import PPStructureV3

pipeline = PPStructureV3()
# pipeline = PPStructureV3(lang="en") # Set the lang parameter to use the English text recognition model. For other supported languages, see Section 5: Appendix. By default, both Chinese and English text recognition models are enabled.
# pipeline = PPStructureV3(use_doc_orientation_classify=True) # Use use_doc_orientation_classify to enable/disable document orientation classification model
# pipeline = PPStructureV3(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module
# pipeline = PPStructureV3(use_textline_orientation=True) # Use use_textline_orientation to enable/disable textline orientation classification model
# pipeline = PPStructureV3(device="gpu") # Use device to specify GPU for model inference
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
    res.print() ## Print the structured prediction output
    res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
    res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
    res.save_to_word(save_path="output") ## Save the current image's result in Word format

The example above uses local Paddle inference engines by default. By default, each module selects the appropriate local Paddle inference engine according to the default model name: models that support only dynamic graph use paddle_dynamic, while models that support both static and dynamic graph prefer paddle_static. To run it, first install PaddlePaddle by following PaddlePaddle Framework Installation.

If you choose transformers as the inference engine, make sure the Transformers environment is configured by following Inference Engine and Configuration, and then run the following code:

python

from paddleocr import PPStructureV3

# Some models are still being supported. For inference, please disable formula recognition and replace the wireless table structure recognition model using the following code:
pipeline = PPStructureV3(
    engine="transformers",
    use_formula_recognition=False,
    wireless_table_structure_recognition_model_name="SLANeXt_wireless",
)
# pipeline = PPStructureV3(lang="en") # Set the lang parameter to use the English text recognition model. For other supported languages, see Section 5: Appendix. By default, both Chinese and English text recognition models are enabled.
# pipeline = PPStructureV3(use_doc_orientation_classify=True) # Use use_doc_orientation_classify to enable/disable document orientation classification model
# pipeline = PPStructureV3(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module
# pipeline = PPStructureV3(use_textline_orientation=True) # Use use_textline_orientation to enable/disable textline orientation classification model
# pipeline = PPStructureV3(device="gpu") # Use device to specify GPU for model inference
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
    res.print() ## Print the structured prediction output
    res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
    res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format

For PDF files, each page will be processed individually and generate a separate Markdown file. If you want to convert the entire PDF to a single Markdown file, use the following method:

python

from pathlib import Path
from paddleocr import PPStructureV3

input_file = "./your_pdf_file.pdf"
output_path = Path("./output")

pipeline = PPStructureV3()
output = pipeline.predict(input=input_file)

markdown_list = []
markdown_images = []

for res in output:
    md_info = res.markdown
    markdown_list.append(md_info)
    markdown_images.append(md_info.get("markdown_images", {}))

markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)

mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)

with open(mkd_file_path, "w", encoding="utf-8") as f:
    f.write(markdown_texts)

for item in markdown_images:
    if item:
        for path, image in item.items():
            file_path = output_path / path
            file_path.parent.mkdir(parents=True, exist_ok=True)
            image.save(file_path)

Note:

The default text recognition model used by PP-StructureV3 is a Chinese-English recognition model, which has limited accuracy for purely English texts. For English-only scenarios, you can set the text_recognition_model_name parameter to an English model such as en_PP-OCRv4_mobile_rec to achieve better recognition performance. For other languages, refer to the model list above and select the appropriate language recognition model for replacement.
In the example code, the parameters use_doc_orientation_classify, use_doc_unwarping, and use_textline_orientation are all set to False by default. These indicate that document orientation classification, document image unwarping, and textline orientation classification are disabled. You can manually set them to True if needed.

The above Python script performs the following steps:

<details><summary>(1) Instantiate <code>PPStructureV3</code> to create the pipeline object. The parameter descriptions are as follows:</summary> <table> <thead> <tr> <th>Parameter</th> <th>Description</th> <th>Type</th> <th>Default</th> </tr> </thead> <tbody> <tr> <td><code>layout_detection_model_name</code></td> <td>Meaning:Name of the layout detection model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_detection_model_dir</code></td> <td>Meaning:Directory path of the layout detection model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_threshold</code></td> <td>Meaning:Score threshold for the layout model.

Description:

<ul> <li>float: Any float between <code>0-1</code>;</li> <li>dict: <code>{0:0.1}</code> where the key is the class ID and the value is the threshold for that class;</li> <li>None: If set to <code>None</code>, uses the pipeline default of <code>0.5</code>.</li> </ul> </td> <td><code>float|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_nms</code></td> <td>Meaning:Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection.

Description: If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is set to <code>True</code> by default.</td>

<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_unclip_ratio</code></td> <td>Meaning:Expansion ratio for the bounding boxes from the layout detection model.

Description:

<ul> <li>float: Any float greater than <code>0</code>;</li> <li>Tuple[float,float]: Expansion ratios in horizontal and vertical directions;</li> <li>dict: A dictionary with int keys representing <code>cls_id</code>, and tuple values, e.g., <code>{0: (1.1, 2.0)}</code> means width is expanded 1.1× and height 2.0× for class 0 boxes;</li> <li>None: If set to <code>None</code>, uses the pipeline default of <code>1.0</code>.</li> </ul> </td> <td><code>float|Tuple[float,float]|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_merge_bboxes_mode</code></td> <td>Meaning:Filtering method for overlapping boxes in layout detection.

Description:

<ul> <li>str: Options include <code>large</code>, <code>small</code>, and <code>union</code> to retain the larger box, smaller box, or both;</li> <li>dict: A dictionary with int keys representing <code>cls_id</code>, and str values, e.g., <code>{0: "large", 2: "small"}</code> means using different modes for different classes;</li> <li>None: If set to <code>None</code>, uses the pipeline default value <code>large</code>.</li> </ul> </td> <td><code>str|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>chart_recognition_model_name</code></td> <td>Meaning:Name of the chart parsing model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>chart_recognition_model_dir</code></td> <td>Meaning:Directory path of the chart parsing model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>chart_recognition_batch_size</code></td> <td>Meaning:Batch size for the chart parsing model.

Description: If set to <code>None</code>, the default is <code>1</code>.</td>

<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>region_detection_model_name</code></td> <td>Meaning:Name of the region detection model for sub-modules in document layout.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>region_detection_model_dir</code></td> <td>Meaning:Directory path of the region detection model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_orientation_classify_model_name</code></td> <td>Meaning:Name of the document orientation classification model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_orientation_classify_model_dir</code></td> <td>Meaning:Directory path of the document orientation classification model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_unwarping_model_name</code></td> <td>Meaning:Name of the document unwarping model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_unwarping_model_dir</code></td> <td>Meaning:Directory path of the document unwarping model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_detection_model_name</code></td> <td>Meaning:Name of the text detection model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_detection_model_dir</code></td> <td>Meaning:Directory path of the text detection model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_limit_side_len</code></td> <td>Meaning:Image side length limitation for text detection.

Description:

<ul> <li>int: Any integer greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, uses the pipeline default of <code>960</code>.</li> </ul> </td> <td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_limit_type</code></td> <td> Meaning:Limit type for text detection.

Description:

<ul> <li>str: Supports <code>min</code> and <code>max</code>. <code>min</code> ensures the shortest side is no less than <code>det_limit_side_len</code>, while <code>max</code> ensures the longest side is no greater than <code>limit_side_len</code>;</li> <li>None: If set to <code>None</code>, uses the pipeline default of <code>max</code>.</li> </ul> </td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_thresh</code></td> <td>Meaning:Pixel threshold for detection. Pixels in the output probability map with scores above this value are considered as text pixels.

Description:

<ul> <li>float: Any float greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, uses the pipeline default value of <code>0.3</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_box_thresh</code></td> <td>Meaning:Bounding box threshold. If the average score of all pixels inside the box exceeds this threshold, it is considered a text region.

Description:

<ul> <li>float: Any float greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, uses the pipeline default value of <code>0.6</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_unclip_ratio</code></td> <td>Meaning:Expansion ratio for text detection. The larger the value, the more the text region is expanded.

Description:

<ul> <li>float: Any float greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, uses the pipeline default value of <code>2.0</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>textline_orientation_model_name</code></td> <td>Meaning:Name of the textline orientation model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>textline_orientation_model_dir</code></td> <td>Meaning:Directory path of the textline orientation model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>textline_orientation_batch_size</code></td> <td>Meaning:Batch size for the textline orientation model.

Description: If set to <code>None</code>, the default batch size is <code>1</code>.</td>

<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_recognition_model_name</code></td> <td>Meaning:Name of the text recognition model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_recognition_model_dir</code></td> <td>Meaning:Directory path of the text recognition model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_recognition_batch_size</code></td> <td>Meaning:Batch size for the text recognition model.

Description: If set to <code>None</code>, the default batch size is <code>1</code>.</td>

<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_rec_score_thresh</code></td> <td>Meaning:Score threshold for text recognition. Only results with scores above this threshold will be retained.

Description:

<ul> <li>float: Any float greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, uses the pipeline default of <code>0.0</code> (no threshold).</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>table_classification_model_name</code></td> <td>Meaning:Name of the table classification model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>table_classification_model_dir</code></td> <td>Meaning:Directory path of the table classification model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wired_table_structure_recognition_model_name</code></td> <td>Meaning:Name of the wired table structure recognition model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wired_table_structure_recognition_model_dir</code></td> <td>Meaning:Directory path of the wired table structure recognition model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wireless_table_structure_recognition_model_name</code></td> <td>Meaning:Name of the wireless table structure recognition model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wireless_table_structure_recognition_model_dir</code></td> <td>Meaning:Directory path of the wireless table structure recognition model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wired_table_cells_detection_model_name</code></td> <td>Meaning:Name of the wired table cell detection model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wired_table_cells_detection_model_dir</code></td> <td>Meaning:Directory path of the wired table cell detection model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <tr> <td><code>wireless_table_cells_detection_model_name</code></td> <td>Meaning:Name of the wireless table cell detection model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wireless_table_cells_detection_model_dir</code></td> <td>Meaning:Directory path of the wireless table cell detection model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>table_orientation_classify_model_name</code></td> <td>Meaning:Name of the wireless table orientation classification model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>table_orientation_classify_model_dir</code></td> <td>Meaning:Directory of the table orientation classification model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_detection_model_name</code></td> <td>Meaning:Name of the seal text detection model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_detection_model_dir</code></td> <td>Meaning:Directory path of the seal text detection model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_limit_side_len</code></td> <td>Meaning:Image side length limit for seal text detection.

Description:

<ul> <li>int: Any integer greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, the default value is <code>736</code>.</li> </ul> </td> <td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_limit_type</code></td> <td>Meaning:Limit type for seal text detection image side length.

Description:

<ul> <li>str: Supports <code>min</code> and <code>max</code>. <code>min</code> ensures the shortest side is no less than <code>det_limit_side_len</code>, while <code>max</code> ensures the longest side is no greater than <code>limit_side_len</code>;</li> <li>None: If set to <code>None</code>, the default value is <code>min</code>.</li> </ul> </td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_thresh</code></td> <td>Meaning:Pixel threshold for detection. Pixels with scores greater than this value in the probability map are considered text pixels.

Description:

<ul> <li>float: Any float greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, the default value is <code>0.2</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_box_thresh</code></td> <td>Meaning:Bounding box threshold. If the average score of all pixels inside a detection box exceeds this threshold, it is considered a text region.

Description:

<ul> <li>float: Any float greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, the default value is <code>0.6</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_unclip_ratio</code></td> <td>Meaning:Expansion ratio for seal text detection. The larger the value, the larger the expanded area.

Description:

<ul> <li>float: Any float greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, the default value is <code>0.5</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_recognition_model_name</code></td> <td>Meaning:Name of the seal text recognition model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_recognition_model_dir</code></td> <td>Meaning:Directory path of the seal text recognition model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_recognition_batch_size</code></td> <td>Meaning:Batch size for the seal text recognition model.

Description: If set to <code>None</code>, the default value is <code>1</code>.</td>

<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_rec_score_thresh</code></td> <td>Meaning:Score threshold for seal text recognition. Text results with scores above this threshold will be retained.

Description:

<ul> <li>float: Any float greater than <code>0</code>;</li> <li>None: If set to <code>None</code>, the default value is <code>0.0</code> (no threshold).</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>formula_recognition_model_name</code></td> <td>Meaning:Name of the formula recognition model.

Description: If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>formula_recognition_model_dir</code></td> <td>Meaning:Directory path of the formula recognition model.

Description: If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>formula_recognition_batch_size</code></td> <td>Meaning:Batch size for the formula recognition model.

Description: If set to <code>None</code>, the default value is <code>1</code>.</td>

<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_orientation_classify</code></td> <td>Meaning:Whether to enable the document orientation classification module.

Description: If set to <code>None</code>, the default value is <code>False</code>.</td>

<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_unwarping</code></td> <td>Meaning:Whether to enable the document image unwarping module.

Description: If set to <code>None</code>, the default value is <code>False</code>.</td>

<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_textline_orientation</code></td> <td>Meaning:Whether to use the text line orientation classification.

Description: If set to <code>None</code>, the default value is <code>False</code>.</td>

<td><code>bool|None</code></td> <td>None</td> </tr> <tr> <td><code>use_seal_recognition</code></td> <td>Meaning:Whether to enable seal text recognition subpipeline.

Description: If set to <code>None</code>, the default value is <code>False</code>.</td>

<td><code>bool|None</code></td> <td>None</td> </tr> <tr> <td><code>use_table_recognition</code></td> <td>Meaning:Whether to enable table recognition subpipeline.

Description: If set to <code>None</code>, the default value is <code>True</code>.</td>

<td><code>bool|None</code></td> <td>None</td> </tr> <tr> <td><code>use_formula_recognition</code></td> <td>Meaning:Whether to enable formula recognition subpipeline.

Description: If set to <code>None</code>, the default value is <code>True</code>.</td>

<td><code>bool|None</code></td> <td>None</td> </tr> <tr> <td><code>use_chart_recognition</code></td> <td>Meaning:Whether to load and use the chart parsing module.

Description: If set to <code>None</code>, the default value is <code>False</code>.</td>

<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_region_detection</code></td> <td>Meaning:Whether to load and use the document region detection module.

Description: If set to <code>None</code>, the default value is <code>True</code>.</td>

<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>format_block_content</code></td> <td>Meaning:Whether to format the content in <code>block_content</code> as Markdown.

Description: If set to <code>None</code>, the default value is <code>False</code>.</td>

<td><code>bool|None</code></td> <td></td> </tr> <tr> <td><code>markdown_ignore_labels</code></td> <td>Meaning:Layout tags that need to be ignored in Markdown.

Description: If set to <code>None</code>, the default value is <code>['number','footnote','header','header_image','footer','footer_image','aside_text']</code>.</td>

<td><code>list|None</code></td> <td></td> </tr> <tr> <td><code>device</code></td> <td>Meaning:Device used for inference.

Description: Supports specifying device ID:

<ul> <li>CPU: e.g., <code>cpu</code> means using CPU for inference;</li> <li>GPU: e.g., <code>gpu:0</code> means using GPU 0;</li> <li>NPU: e.g., <code>npu:0</code> means using NPU 0;</li> <li>XPU: e.g., <code>xpu:0</code> means using XPU 0;</li> <li>MLU: e.g., <code>mlu:0</code> means using MLU 0;</li> <li>DCU: e.g., <code>dcu:0</code> means using DCU 0;</li> <li>MetaX GPU: e.g., <code>metax_gpu:0</code> means using MetaX GPU 0;</li> <li>Iluvatar GPU: e.g., <code>iluvatar_gpu:0</code> means using Iluvatar GPU 0;</li> <li>None: If set to <code>None</code>, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.</li> </ul> </td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>engine</code></td> <td>Meaning: Inference engine. Description: Supports <code>None</code> (the default), <code>paddle</code>, <code>paddle_static</code>, <code>paddle_dynamic</code>, and <code>transformers</code>. When left as <code>None</code>, PaddleOCR preserves the behavior of earlier versions, which in most configurations is equivalent to <code>paddle</code>. For detailed descriptions, supported values, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>engine_config</code></td> <td>Meaning: Inference-engine configuration. Description: Recommended together with <code>engine</code>. For supported fields, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>enable_hpi</code></td> <td>Meaning: Whether to enable high-performance inference.</td> <td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_tensorrt</code></td> <td>Meaning: Whether to enable the TensorRT subgraph engine of Paddle Inference.

Description: If the model does not support TensorRT acceleration, acceleration will not be used even if this flag is set.

For CUDA 11.8 versions of PaddlePaddle, the compatible TensorRT version is 8.x (x>=6). TensorRT 8.6.1.6 is recommended.

</td> <td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>precision</code></td> <td>Meaning: Computation precision, such as <code>"fp32"</code> or <code>"fp16"</code>.</td> <td><code>str</code></td> <td><code>"fp32"</code></td> </tr> <tr> <td><code>enable_mkldnn</code></td> <td>Meaning: Whether to enable MKL-DNN accelerated inference.

Description: If MKL-DNN is unavailable or the model does not support MKL-DNN acceleration, acceleration will not be used even if this flag is set.

</td> <td><code>bool</code></td> <td><code>True</code></td> </tr> <tr> <td><code>mkldnn_cache_capacity</code></td> <td> Meaning: MKL-DNN cache capacity. </td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>cpu_threads</code></td> <td>Meaning: Number of threads used for inference on CPU.</td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>paddlex_config</code></td> <td>Meaning: Path to the PaddleX pipeline configuration file.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> </tbody> </table> </details> <details><summary>(2) Call the <code>predict()</code> method of the PP-StructureV3 pipeline object for inference. This method returns a result list. The pipeline also provides a <code>predict_iter()</code> method. Both methods accept the same parameters and return the same type of results. The only difference is that <code>predict_iter()</code> returns a <code>generator</code> that allows incremental processing and retrieval of prediction results, which is useful for handling large datasets or saving memory. Choose the method that fits your needs. Below are the parameters of the <code>predict()</code> method:</summary> <table> <thead> <tr> <th>Parameter</th> <th>Description</th> <th>Type</th> <th>Default</th> </tr> </thead> <tr> <td><code>input</code></td> <td>Meaning:Input data to be predicted. Required.

Description: Supports multiple types:

<ul> <li>Python Var: Image data represented as <code>numpy.ndarray</code>;</li> <li>str: Local path to image or PDF file, e.g., <code>/root/data/img.jpg</code>; URL to image or PDF, e.g., <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/demo_paper.png">example</a>; directory containing image files, e.g., <code>/root/data/</code> (directories with PDFs are not supported, use full file path for PDFs);</li> <li>list: Elements can be any of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"].</code></li> </ul> </td> <td><code>Python Var|str|list</code></td> <td></td> </tr> <tr> <td><code>use_doc_orientation_classify</code></td> <td>Meaning:Whether to use document orientation classification during inference.