docs/version3.x/pipeline_usage/PP-StructureV3.en.md
Layout analysis is a technique used to extract structured information from document images. It is primarily used to convert complex document layouts into machine-readable data formats. This technology has broad applications in document management, information extraction, and data digitization. Layout analysis combines Optical Character Recognition (OCR), image processing, and machine learning algorithms to identify and extract text blocks, titles, paragraphs, images, tables, and other layout elements from documents. This process generally includes three main steps: layout analysis, element analysis, and data formatting. The final result is structured document data, which enhances the efficiency and accuracy of data processing. <b>PP-StructureV3 improves upon the general layout analysis v1 pipeline by enhancing layout region detection, table recognition, and formula recognition. It also adds capabilities such as multi-column reading order recovery, chart understanding, and result conversion to Markdown files. It performs excellently across various document types and can handle complex document data.</b> This pipeline also provides flexible service deployment options, supporting invocation using multiple programming languages on various hardware. In addition, it offers secondary development capabilities, allowing you to train and fine-tune models on your own dataset and integrate the trained models seamlessly.
<b>The PP-StructureV3 pipeline consists of the following seven modules or sub-pipelines. Each module or sub-pipeline can be trained and inferred independently and contains multiple models. For more details, please click the corresponding links to view the documentation.</b>
In this pipeline, you can choose the model to use based on the benchmark data below.
<details> <summary><b>Document Image Orientation Classification Module :</b></summary> <table> <thead> <tr> <th>Model</th><th>Download Link</th> <th>Top-1 Acc (%)</th> <th>GPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>CPU Inference Time (ms) [Standard Mode / High Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>PP-LCNet_x1_0_doc_ori</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-LCNet_x1_0_doc_ori_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x1_0_doc_ori_pretrained.pdparams">Pretrained Model</a></td> <td>99.06</td> <td>2.62 / 0.59</td> <td>3.24 / 1.19</td> <td>7</td> <td>Document image classification model based on PP-LCNet_x1_0, supporting four categories: 0°, 90°, 180°, 270°</td> </tr> </tbody> </table> </details> <details> <summary><b>Text Image Rectification Module:</b></summary> <p><b>Text Image Rectification Module (Optional):</b></p> <table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>CER</th> <th>GPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>UVDoc</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/UVDoc_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UVDoc_pretrained.pdparams">Pretrained Model</a></td> <td>0.179</td> <td>19.05 / 19.05</td> <td>- / 869.82</td> <td>30.3</td> <td>High-precision text image rectification model</td> </tr> </tbody> </table> </details> <details> <summary><b>Layout Detection Module Model:</b></summary> * <b>The layout detection model includes 20 common categories: document title, paragraph title, text, page number, abstract, table, references, footnotes, header, footer, algorithm, formula, formula number, image, table, seal, figure_table title, chart, and sidebar text and lists of references</b> <table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>mAP(0.5) (%)</th> <th>GPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Introduction</th> </tr> </thead> <tbody> <tr> <td>PP-DocLayout_plus-L</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-DocLayout_plus-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout_plus-L_pretrained.pdparams">Training Model</a></td> <td>83.2</td> <td>53.03 / 17.23</td> <td>634.62 / 378.32</td> <td>126.01</td> <td>A higher-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L</td> </tr> <tr> </tbody> </table>The inference time only includes the model inference time and does not include the time for pre- or post-processing. In the inference time columns labeled [Standard Mode / High-Performance Mode], [Normal Mode / High-Performance Mode], or [Regular Mode / High-Performance Mode], the Standard Mode, Normal Mode, and Regular Mode values correspond to local Paddle inference engines. Each module selects the appropriate local Paddle inference engine according to the default model name: models that support only dynamic graph use
paddle_dynamic, while models that support both static and dynamic graph preferpaddle_static.
<details><summary> 👉 Details of Model List</summary>❗ The above list includes the <b>4 core models</b> that are key supported by the text recognition module. The module actually supports a total of <b>12 full models</b>, including several predefined models with different categories. The complete model list is as follows:
Before using the PP-StructureV3 pipeline locally, please make sure you have completed the installation of the wheel package according to the installation guide. If you prefer to install dependencies selectively, please refer to the relevant instructions in the installation documentation. The corresponding dependency group for this pipeline is <code>doc-parser</code>. After installation, you can use it via command line or Python integration.
Please note: If you encounter issues such as the program becoming unresponsive, unexpected program termination, running out of memory resources, or extremely slow inference during execution, please try adjusting the configuration according to the documentation, such as disabling unnecessary features or using lighter-weight models.
Use a single command to quickly experience the PP-StructureV3 pipeline:
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png
# Use --use_doc_orientation_classify to enable document orientation classification
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_doc_orientation_classify True
# Use --use_doc_unwarping to enable document unwarping module
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_doc_unwarping True
# Use --use_textline_orientation to enable text line orientation classification
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_textline_orientation False
# Use --device to specify GPU for inference
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --device gpu
The examples above use local Paddle inference engines by default. By default, each module selects the appropriate local Paddle inference engine according to the default model name: models that support only dynamic graph use paddle_dynamic, while models that support both static and dynamic graph prefer paddle_static. To run them, first install PaddlePaddle by following PaddlePaddle Framework Installation.
If you choose transformers as the inference engine, make sure the Transformers environment is configured by following Inference Engine and Configuration, and then run the following command:
# Use the transformers engine for inference
# Some models are still being supported. For inference, please disable formula recognition and replace the wireless table structure recognition model using the following command:
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png \
--engine transformers --use_formula_recognition False --wireless_table_structure_recognition_model_name SLANeXt_wireless
<b>Description:</b> .e.g., local path to image or PDF file: <code>/root/data/img.jpg</code>; <b>URL</b>, e.g., online image or PDF: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/demo_paper.png">example</a>; <b>local directory</b>: directory containing images to predict, e.g., <code>/root/data/</code> (currently, directories with PDFs are not supported; PDFs must be specified by file path).
</td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>save_path</code></td> <td><b>Meaning:</b>Path to save inference results.<b>Description:</b> If not set, results will not be saved locally.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>layout_detection_model_name</code></td> <td><b>Meaning:</b>Name of the layout detection model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>layout_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the layout detection model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>layout_threshold</code></td> <td><b>Meaning:</b>Score threshold for the layout model.<b>Description:</b> Any value between <code>0-1</code>. If not set, the default value is used, which is <code>0.5</code>.
</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>layout_nms</code></td> <td><b>Meaning:</b>Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection.<b>Description:</b> If not set, the parameter will default to the value initialized in the pipeline, which is set to <code>True</code> by default.</td>
<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>layout_unclip_ratio</code></td> <td><b>Meaning:</b>Unclip ratio for detected boxes in layout detection model.<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>1.0</code>.
<td><code>float</code></td> <td></td> </tr> <tr> <td><code>layout_merge_bboxes_mode</code></td> <td><b>Meaning:</b>The merging mode for the detection boxes output by the model in layout detection.<b>Description:</b>
<ul> <li><b>large</b>: When set to "large", only the largest outer bounding box will be retained for overlapping bounding boxes, and the inner overlapping boxes will be removed;</li> <li><b>small</b>: When set to "small", only the smallest inner bounding boxes will be retained for overlapping bounding boxes, and the outer overlapping boxes will be removed;</li> <li><b>union</b>: No filtering of bounding boxes will be performed, and both inner and outer boxes will be retained;</li> </ul>If not set, the default is <code>large</code>. </td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>chart_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the chart parsing model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>chart_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory path of the chart parsing model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>chart_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size for the chart parsing model.<b>Description:</b> If not set, the default batch size is <code>1</code>.</td>
<td><code>int</code></td> <td></td> </tr> <tr> <td><code>region_detection_model_name</code></td> <td><b>Meaning:</b>Name of the region detection model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>region_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the region detection model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_orientation_classify_model_name</code></td> <td><b>Meaning:</b>Name of the document orientation classification model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_orientation_classify_model_dir</code></td> <td><b>Meaning:</b>Directory path of the document orientation classification model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_unwarping_model_name</code></td> <td><b>Meaning:</b>Name of the document unwarping model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_unwarping_model_dir</code></td> <td><b>Meaning:</b>Directory path of the document unwarping model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_detection_model_name</code></td> <td><b>Meaning:</b>Name of the text detection model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the text detection model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_det_limit_side_len</code></td> <td><b>Meaning:</b>Image side length limitation for text detection.<b>Description:</b> Any integer > <code>0</code>. If not set, the default value will be <code>960</code>.
</td> <td><code>int</code></td> <td></td> </tr> <tr> <td><code>text_det_limit_type</code></td> <td><b>Meaning:</b>Type of the image side length limit for text detection.<b>Description:</b> Supports <code>min</code> and <code>max</code>; <code>min</code> means ensuring the shortest side of the image is not less than <code>det_limit_side_len</code>, <code>max</code> means the longest side does not exceed <code>limit_side_len</code>. If not set, the default value will be <code>max</code>.
</td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_det_thresh</code></td> <td><b>Meaning:</b>Pixel threshold for text detection. Pixels with scores above this value in the probability map are considered text.<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.3</code>.
</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>text_det_box_thresh</code></td> <td><b>Meaning:</b>Box threshold for text detection. A bounding box is considered text if the average score of pixels inside is greater than this value.<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.6</code>.
</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>text_det_unclip_ratio</code></td> <td><b>Meaning:</b>Expansion ratio for text detection. The higher the value, the larger the expansion area.<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>2.0</code>.
</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>textline_orientation_model_name</code></td> <td><b>Meaning:</b>Name of the text line orientation model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>textline_orientation_model_dir</code></td> <td><b>Meaning:</b>Directory path of the text line orientation model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>textline_orientation_batch_size</code></td> <td><b>Meaning:</b>Batch size for the text line orientation model.<b>Description:</b> If not set, the default is <code>1</code>.</td>
<td><code>int</code></td> <td></td> </tr> <tr> <td><code>text_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the text recognition model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory of the text recognition model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size for text recognition.<b>Description:</b> If not set, the default is <code>1</code>.</td>
<td><code>int</code></td> <td></td> </tr> <tr> <td><code>text_rec_score_thresh</code></td> <td><b>Meaning:</b>Score threshold for text recognition. Only results above this value will be kept.<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.0</code> (no threshold).
</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>table_classification_model_name</code></td> <td><b>Meaning:</b>Name of the table classification model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>table_classification_model_dir</code></td> <td><b>Meaning:</b>Directory of the table classification model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wired_table_structure_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the wired table structure recognition model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wired_table_structure_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory of the wired table structure recognition model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wireless_table_structure_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the wireless table structure recognition model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wireless_table_structure_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory of the wireless table structure recognition model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wired_table_cells_detection_model_name</code></td> <td><b>Meaning:</b>Name of the wired table cell detection model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wired_table_cells_detection_model_dir</code></td> <td><b>Meaning:</b>Directory of the wired table cell detection model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wireless_table_cells_detection_model_name</code></td> <td><b>Meaning:</b>Name of the wireless table cell detection model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>wireless_table_cells_detection_model_dir</code></td> <td><b>Meaning:</b>Directory of the wireless table cell detection model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>table_orientation_classify_model_name</code></td> <td><b>Meaning:</b>Name of the wireless table orientation classification model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>table_orientation_classify_model_dir</code></td> <td><b>Meaning:</b>Directory of the table orientation classification model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_detection_model_name</code></td> <td><b>Meaning:</b>Name of the seal text detection model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_detection_model_dir</code></td> <td><b>Meaning:</b>Directory of the seal text detection model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_det_limit_side_len</code></td> <td><b>Meaning:</b>Image side length limit for seal text detection.<b>Description:</b> Any integer > <code>0</code>. If not set, the default is <code>736</code>.
</td> <td><code>int</code></td> <td></td> </tr> <tr> <td><code>seal_det_limit_type</code></td> <td><b>Meaning:</b>Limit type for image side in seal text detection.<b>Description:</b> Supports <code>min</code> and <code>max</code>; <code>min</code> ensures shortest side ≥ <code>det_limit_side_len</code>, <code>max</code> ensures longest side ≤ <code>limit_side_len</code>. If not set, the default is <code>min</code>.
</td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_det_thresh</code></td> <td><b>Meaning:</b>Pixel threshold. Pixels with scores above this value in the probability map are considered text.<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.2</code>.
</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>seal_det_box_thresh</code></td> <td><b>Meaning:</b>Box threshold. Boxes with average pixel scores above this value are considered text regions.<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.6</code>.
</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>seal_det_unclip_ratio</code></td> <td><b>Meaning:</b>Expansion ratio for seal text detection. Higher value means larger expansion area.<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.5</code>.
</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>seal_text_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the seal text recognition model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory of the seal text recognition model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size for seal text recognition.<b>Description:</b> If not set, the default is <code>1</code>.</td>
<td><code>int</code></td> <td></td> </tr> <tr> <td><code>seal_rec_score_thresh</code></td> <td><b>Meaning:</b>Recognition score threshold. Text results above this value will be kept.<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.0</code> (no threshold).
</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>formula_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the formula recognition model.<b>Description:</b> If not set, the default model will be used.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>formula_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory of the formula recognition model.<b>Description:</b> If not set, the official model will be downloaded.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>formula_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size of the formula recognition model.<b>Description:</b> If not set, the default is <code>1</code>.</td>
<td><code>int</code></td> <td></td> </tr> <tr> <td><code>use_doc_orientation_classify</code></td> <td><b>Meaning:</b>Whether to load and use the document orientation classification module.<b>Description:</b> If not set, the default is <code>False</code>.</td>
<td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_unwarping</code></td> <td><b>Meaning:</b>Whether to load and use the document unwarping module.<b>Description:</b> If not set, the default is <code>False</code>.</td>
<td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_textline_orientation</code></td> <td><b>Meaning:</b>Whether to load and use the text line orientation classification module.<b>Description:</b> If not set, the default is <code>False</code>.</td>
<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_seal_recognition</code></td> <td><b>Meaning:</b>Whether to load and use seal text recognition subpipeline.<b>Description:</b> If not set, the default is <code>False</code>.</td>
<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_table_recognition</code></td> <td><b>Meaning:</b>Whether to load and use table recognition subpipeline.<b>Description:</b> If not set, the default is <code>True</code>.</td>
<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_formula_recognition</code></td> <td><b>Meaning:</b>Whether to load and use formula recognition subpipeline.<b>Description:</b> If not set, the default is <code>True</code>.</td>
<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_chart_recognition</code></td> <td><b>Meaning:</b>Whether to load and use the chart parsing module.<b>Description:</b> If not set, the default is <code>False</code>.</td>
<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_region_detection</code></td> <td><b>Meaning:</b>Whether to load and use the document region detection module.<b>Description:</b> If not set, the default is <code>True</code>.</td>
<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>format_block_content</code></td> <td><b>Meaning:</b>Whether to format the content in <code>block_content</code> as Markdown.<b>Description:</b> If not set, the initialized default value will be used, which is <code>False</code> by default.</td>
<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>markdown_ignore_labels</code></td> <td><b>Meaning:</b>Layout tags that need to be ignored in Markdown.<b>Description:</b> If not set, the initialized default value will be used, which is <code>['number','footnote','header','header_image','footer','footer_image','aside_text']</code> by default.</td>
<td><code>str</code></td> <td></td> </tr> <tr> <td><code>device</code></td> <td><b>Meaning:</b>Device for inference.<b>Description:</b> You can specify a device ID:
<ul> <li><b>CPU</b>: e.g., <code>cpu</code> means using CPU for inference;</li> <li><b>GPU</b>: e.g., <code>gpu:0</code> means GPU 0</li> <li><b>NPU</b>: e.g., <code>npu:0</code> means NPU 0</li> <li><b>XPU</b>: e.g., <code>xpu:0</code> means XPU 0</li> <li><b>MLU</b>: e.g., <code>mlu:0</code> means MLU 0</li> <li><b>DCU</b>: e.g., <code>dcu:0</code> means DCU 0</li> <li><b>MetaX GPU</b>: e.g., <code>metax_gpu:0</code> means MetaX GPU 0</li> <li><b>Iluvatar GPU</b>: e.g., <code>iluvatar_gpu:0</code> means Iluvatar GPU 0</li> </ul>If not set, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used. </td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>engine</code></td> <td><b>Meaning:</b> Inference engine. <b>Description:</b> Supports <code>None</code> (the default), <code>paddle</code>, <code>paddle_static</code>, <code>paddle_dynamic</code>, and <code>transformers</code>. When left as <code>None</code>, PaddleOCR preserves the behavior of earlier versions, which in most configurations is equivalent to <code>paddle</code>. For detailed descriptions, supported values, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>enable_hpi</code></td> <td><b>Meaning:</b> Whether to enable high-performance inference.</td> <td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_tensorrt</code></td> <td><b>Meaning:</b> Whether to enable the TensorRT subgraph engine of Paddle Inference.<b>Description:</b> If the model does not support TensorRT acceleration, acceleration will not be used even if this flag is set.
For CUDA 11.8 versions of PaddlePaddle, the compatible TensorRT version is 8.x (x>=6). TensorRT 8.6.1.6 is recommended.
</td> <td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>precision</code></td> <td><b>Meaning:</b> Computation precision, such as <code>fp32</code> or <code>fp16</code>.</td> <td><code>str</code></td> <td><code>fp32</code></td> </tr> <tr> <td><code>enable_mkldnn</code></td> <td><b>Meaning:</b> Whether to enable MKL-DNN accelerated inference.<b>Description:</b> If MKL-DNN is unavailable or the model does not support MKL-DNN acceleration, acceleration will not be used even if this flag is set.
</td> <td><code>bool</code></td> <td><code>True</code></td> </tr> <tr> <td><code>mkldnn_cache_capacity</code></td> <td> <b>Meaning:</b> MKL-DNN cache capacity. </td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>cpu_threads</code></td> <td><b>Meaning:</b> Number of threads used for inference on CPU.</td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>paddlex_config</code></td> <td><b>Meaning:</b> Path to the PaddleX pipeline configuration file.</td> <td><code>str</code></td> <td></td> </tr> </tbody> </table> </details>The inference result will be printed in the terminal. The default output of the PP-StructureV3 pipeline is as follows:
<details><summary> 👉Click to expand</summary> <pre> <code> {'res': {'input_path': 'pp_structure_v3_demo.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_general_ocr': True, 'use_seal_recognition': True, 'use_table_recognition': True, 'use_formula_recognition': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9853514432907104, 'coordinate': [770.9531, 776.6814, 1122.6057, 1058.7322]}, {'cls_id': 1, 'label': 'image', 'score': 0.9848673939704895, 'coordinate': [775.7434, 202.27979, 1502.8113, 686.02136]}, {'cls_id': 2, 'label': 'text', 'score': 0.983731746673584, 'coordinate': [1152.3197, 1113.3275, 1503.3029, 1346.586]}, {'cls_id': 2, 'label': 'text', 'score': 0.9832221865653992, 'coordinate': [1152.5602, 801.431, 1503.8436, 986.3563]}, {'cls_id': 2, 'label': 'text', 'score': 0.9829439520835876, 'coordinate': [9.549545, 849.5713, 359.1173, 1058.7488]}, {'cls_id': 2, 'label': 'text', 'score': 0.9811657667160034, 'coordinate': [389.58298, 1137.2659, 740.66235, 1346.7488]}, {'cls_id': 2, 'label': 'text', 'score': 0.9775941371917725, 'coordinate': [9.1302185, 201.85, 359.0409, 339.05692]}, {'cls_id': 2, 'label': 'text', 'score': 0.9750366806983948, 'coordinate': [389.71454, 752.96924, 740.544, 889.92456]}, {'cls_id': 2, 'label': 'text', 'score': 0.9738152027130127, 'coordinate': [389.94565, 298.55988, 740.5585, 435.5124]}, {'cls_id': 2, 'label': 'text', 'score': 0.9737328290939331, 'coordinate': [771.50256, 1065.4697, 1122.2582, 1178.7324]}, {'cls_id': 2, 'label': 'text', 'score': 0.9728517532348633, 'coordinate': [1152.5154, 993.3312, 1503.2349, 1106.327]}, {'cls_id': 2, 'label': 'text', 'score': 0.9725610017776489, 'coordinate': [9.372787, 1185.823, 359.31738, 1298.7227]}, {'cls_id': 2, 'label': 'text', 'score': 0.9724331498146057, 'coordinate': [389.62848, 610.7389, 740.83234, 746.2377]}, {'cls_id': 2, 'label': 'text', 'score': 0.9720287322998047, 'coordinate': [389.29898, 897.0936, 741.41516, 1034.6616]}, {'cls_id': 2, 'label': 'text', 'score': 0.9713053703308105, 'coordinate': [10.323685, 1065.4663, 359.6786, 1178.8872]}, {'cls_id': 2, 'label': 'text', 'score': 0.9689728021621704, 'coordinate': [9.336395, 537.6609, 359.2901, 652.1881]}, {'cls_id': 2, 'label': 'text', 'score': 0.9684857130050659, 'coordinate': [10.7608185, 345.95068, 358.93616, 434.64087]}, {'cls_id': 2, 'label': 'text', 'score': 0.9681928753852844, 'coordinate': [9.674866, 658.89075, 359.56528, 770.4319]}, {'cls_id': 2, 'label': 'text', 'score': 0.9634978175163269, 'coordinate': [770.9464, 1281.1785, 1122.6522, 1346.7156]}, {'cls_id': 2, 'label': 'text', 'score': 0.96304851770401, 'coordinate': [390.0113, 201.28055, 740.1684, 291.53073]}, {'cls_id': 2, 'label': 'text', 'score': 0.962053120136261, 'coordinate': [391.21393, 1040.952, 740.5046, 1130.32]}, {'cls_id': 2, 'label': 'text', 'score': 0.9565253853797913, 'coordinate': [10.113251, 777.1482, 359.439, 842.437]}, {'cls_id': 2, 'label': 'text', 'score': 0.9497362375259399, 'coordinate': [390.31357, 537.86285, 740.47595, 603.9285]}, {'cls_id': 2, 'label': 'text', 'score': 0.9371236562728882, 'coordinate': [10.2034, 1305.9753, 359.5958, 1346.7295]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9338151216506958, 'coordinate': [791.6062, 1200.8479, 1103.3257, 1259.9324]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9326773285865784, 'coordinate': [408.0737, 457.37024, 718.9509, 516.63464]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9274250864982605, 'coordinate': [29.448685, 456.6762, 340.99194, 515.6999]}, {'cls_id': 2, 'label': 'text', 'score': 0.8742568492889404, 'coordinate': [1154.7095, 777.3624, 1330.3086, 794.5853]}, {'cls_id': 2, 'label': 'text', 'score': 0.8442489504814148, 'coordinate': [586.49316, 160.15454, 927.468, 179.64203]}, {'cls_id': 11, 'label': 'doc_title', 'score': 0.8332607746124268, 'coordinate': [133.80017, 37.41908, 1380.8601, 124.1429]}, {'cls_id': 6, 'label': 'figure_title', 'score': 0.6770150661468506, 'coordinate': [812.1718, 705.1199, 1484.6973, 747.1692]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[133, 35], ..., [133, 131]], ...,
[[ 13, 754],
...,
[ 13, 777]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['助力双方交往', '搭建友谊桥梁', '本报记者', '沈小晓', '任', '彦', '黄培昭', '身着中国传统民族服装的厄立特里亚青', '厄立特里亚高等教育与研究院合作建立,开', '年依次登台表演中国民族舞、现代舞、扇子舞', '设了中国语言课程和中国文化课程,注册学', '等,曼妙的舞姿赢得现场观众阵阵掌声。这', '生2万余人次。10余年来,厄特孔院已成为', '是日前厄立特里亚高等教育与研究院孔子学', '当地民众了解中国的一扇窗口。', '院(以下简称"厄特孔院")举办"喜迎新年"中国', '黄鸣飞表示,随着来学习中文的人日益', '歌舞比赛的场景。', '增多,阿斯马拉大学教学点已难以满足教学', '中国和厄立特里亚传统友谊深厚。近年', '需要。2024年4月,由中企蜀道集团所属四', '来,在高质量共建"一带一路"框架下,中厄两', '川路桥承建的孔院教学楼项目在阿斯马拉开', '国人文交流不断深化,互利合作的民意基础', '工建设,预计今年上半年竣工,建成后将为厄', '日益深厚。', '特孔院提供全新的办学场地。', '“学好中文,我们的', '“在中国学习的经历', '未来不是梦”', '让我看到更广阔的世界”', '多年来,厄立特里亚广大赴华留学生和', '培训人员积极投身国家建设,成为助力该国', '发展的人才和厄中友好的见证者和推动者。', '在厄立特里亚全国妇女联盟工作的约翰', '娜·特韦尔德·凯莱塔就是其中一位。她曾在', '中华女子学院攻读硕士学位,研究方向是女', '性领导力与社会发展。其间,她实地走访中国', '多个地区,获得了观察中国社会发展的第一', '在厄立特里亚不久前举办的第六届中国风筝文化节上,当地小学生体验风筝制作。', '手资料。', '中国驻厄立特里亚大使馆供图', '“这是中文歌曲初级班,共有32人。学', '“不管远近都是客人,请不用客气;相约', '瓦的北红海省博物馆。', '生大部分来自首都阿斯马拉的中小学,年龄', '好了在一起,我们欢迎你"在一场中厄青', '博物馆二层陈列着一个发掘自阿杜利', '最小的仅有6岁。"尤斯拉告诉记者。', '年联谊活动上,四川路桥中方员工同当地大', '斯古城的中国古代陶制酒器,罐身上写着', '尤斯拉今年23岁,是厄立特里亚一所公立', '学生合唱《北京欢迎你》。厄立特里亚技术学', '“万”“和""禅”“山"等汉字。“这件文物证', '学校的艺术老师。她12岁开始在厄特孔院学', '院计算机科学与工程专业学生鲁夫塔·谢拉', '明,很早以前我们就通过海上丝绸之路进行', '习中文,在2017年第十届"汉语桥"世界中学生', '是其中一名演唱者,她很早便在孔院学习中', '贸易往来与文化交流。这也是厄立特里亚', '中文比赛中获得厄立特里亚赛区第一名,并和', '文,一直在为去中国留学作准备。“这句歌词', '与中国友好交往历史的有力证明。"北红海', '同伴代表厄立特里亚前往中国参加决赛,获得', '是我们两国人民友谊的生动写照。无论是投', '省博物馆研究与文献部负责人伊萨亚斯·特', '团体优胜奖。2022年起,尤斯拉开始在厄特孔', '身于厄立特里亚基础设施建设的中企员工,', '斯法兹吉说。', '院兼职教授中文歌曲,每周末两个课时。中国', '还是在中国留学的厄立特里亚学子,两国人', '厄立特里亚国家博物馆考古学和人类学', '文化博大精深,我希望我的学生们能够通过中', '民携手努力,必将推动两国关系不断向前发', '研究员菲尔蒙·特韦尔德十分喜爱中国文', '文歌曲更好地理解中国文化。"她说。', '穆卢盖塔密切关注中国在经济、科技、教', '展。"鲁夫塔说。', '化。他表示:“学习彼此的语言和文化,将帮', '“姐姐,你想去中国吗?"“非常想!我想', '育等领域的发展,“中国在科研等方面的实力', '厄立特里亚高等教育委员会主任助理萨', '助厄中两国人民更好地理解彼此,助力双方', '去看故宫、爬长城。"尤斯拉的学生中有一对', '与日俱增。在中国学习的经历让我看到更广', '马瑞表示:“每年我们都会组织学生到中国访', '交往,搭建友谊桥梁。"', '能歌善舞的姐妹,姐姐露娅今年15岁,妹妹', '阔的世界,从中受益匪浅。', '问学习,目前有超过5000名厄立特里亚学生', '厄立特里亚国家博物馆馆长塔吉丁·努', '莉娅14岁,两人都已在厄特孔院学习多年,', '23岁的莉迪亚·埃斯蒂法诺斯已在厄特', '在中国留学。学习中国的教育经验,有助于', '里达姆·优素福曾多次访问中国,对中华文明', '中文说得格外流利。', '孔院学习3年,在中国书法、中国画等方面表', '提升厄立特里亚的教育水平。”', '的传承与创新、现代化博物馆的建设与发展', '露娅对记者说:“这些年来,怀着对中文', '现十分优秀,在2024年厄立特里亚赛区的', '“共同向世界展示非', '印象深刻。“中国博物馆不仅有许多保存完好', '和中国文化的热爱,我们姐妹俩始终相互鼓', '“汉语桥"比赛中获得一等奖。莉迪亚说:“学', '的文物,还充分运用先进科技手段进行展示,', '励,一起学习。我们的中文一天比一天好,还', '习中国书法让我的内心变得安宁和纯粹。我', '洲和亚洲的灿烂文明”', '帮助人们更好理解中华文明。"塔吉丁说,厄', '学会了中文歌和中国舞。我们一定要到中国', '也喜欢中国的服饰,希望未来能去中国学习,', '立特里亚与中国都拥有悠久的文明,始终相', '去。学好中文,我们的未来不是梦!"', '把中国不同民族元素融入服装设计中,创作', '从阿斯马拉出发,沿着蜿蜒曲折的盘山', '互理解、相互尊重。我希望未来与中国同行', '据厄特孔院中方院长黄鸣飞介绍,这所', '出更多精美作品,也把厄特文化分享给更多', '公路一路向东寻找丝路印迹。驱车两个小', '加强合作,共同向世界展示非洲和亚洲的灿', '孔院成立于2013年3月,由贵州财经大学和', '的中国朋友。”', '时,记者来到位于厄立特里亚港口城市马萨', '烂文明。”', '谈起在中国求学的经历,约翰娜记忆犹', '新:“中国的发展在当今世界是独一无二的。', '沿着中国特色社会主义道路坚定前行,中国', '创造了发展奇迹,这一切都离不开中国共产党', '的领导。中国的发展经验值得许多国家学习', '借鉴,”', '正在西南大学学习的厄立特里亚博士生', '穆卢盖塔·泽穆伊对中国怀有深厚感情。8', '年前,在北京师范大学获得硕士学位后,穆卢', '盖塔在社交媒体上写下这样一段话:“这是我', '人生的重要一步,自此我拥有了一双坚固的', '鞋子.赋予我穿越荆棘的力量。”', '“鲜花曾告诉我你怎样走过,大地知道你', '心中的每一个角落"厄立特里亚阿斯马拉', '大学综合楼二层,一阵优美的歌声在走廊里回', '响。循着熟悉的旋律轻轻推开一间教室的门,', '学生们正跟着老师学唱中文歌曲《同一首歌》。', '这是厄特孔院阿斯马拉大学教学点的一', '节中文歌曲课。为了让学生们更好地理解歌', '词大意,老师尤斯拉·穆罕默德萨尔·侯赛因逐', '字翻译和解释歌词。随着伴奏声响起,学生们', '边唱边随着节拍摇动身体,现场气氛热烈。'], 'rec_scores': array([0.99972075, ..., 0.96241361]), 'rec_polys': array([[[133, 35],
...,
[133, 131]],
...,
[[ 13, 754],
...,
[ 13, 777]]], dtype=int16), 'rec_boxes': array([[133, ..., 131],
...,
[ 13, ..., 777]], dtype=int16)}}}
</code></pre></details>
For explanation of the result parameters, refer to 2.2 Python Script Integration.
<b>Note:</b> Due to the large size of the default model in the pipeline, the inference speed may be slow. You can refer to the model list in Section 1 to replace it with a faster model.
The command line method is for quick testing and visualization. In actual projects, you usually need to integrate the model via code. You can perform pipeline inference with just a few lines of code as shown below:
from paddleocr import PPStructureV3
pipeline = PPStructureV3()
# pipeline = PPStructureV3(lang="en") # Set the lang parameter to use the English text recognition model. For other supported languages, see Section 5: Appendix. By default, both Chinese and English text recognition models are enabled.
# pipeline = PPStructureV3(use_doc_orientation_classify=True) # Use use_doc_orientation_classify to enable/disable document orientation classification model
# pipeline = PPStructureV3(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module
# pipeline = PPStructureV3(use_textline_orientation=True) # Use use_textline_orientation to enable/disable textline orientation classification model
# pipeline = PPStructureV3(device="gpu") # Use device to specify GPU for model inference
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
res.print() ## Print the structured prediction output
res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
res.save_to_word(save_path="output") ## Save the current image's result in Word format
The example above uses local Paddle inference engines by default. By default, each module selects the appropriate local Paddle inference engine according to the default model name: models that support only dynamic graph use paddle_dynamic, while models that support both static and dynamic graph prefer paddle_static. To run it, first install PaddlePaddle by following PaddlePaddle Framework Installation.
If you choose transformers as the inference engine, make sure the Transformers environment is configured by following Inference Engine and Configuration, and then run the following code:
from paddleocr import PPStructureV3
# Some models are still being supported. For inference, please disable formula recognition and replace the wireless table structure recognition model using the following code:
pipeline = PPStructureV3(
engine="transformers",
use_formula_recognition=False,
wireless_table_structure_recognition_model_name="SLANeXt_wireless",
)
# pipeline = PPStructureV3(lang="en") # Set the lang parameter to use the English text recognition model. For other supported languages, see Section 5: Appendix. By default, both Chinese and English text recognition models are enabled.
# pipeline = PPStructureV3(use_doc_orientation_classify=True) # Use use_doc_orientation_classify to enable/disable document orientation classification model
# pipeline = PPStructureV3(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module
# pipeline = PPStructureV3(use_textline_orientation=True) # Use use_textline_orientation to enable/disable textline orientation classification model
# pipeline = PPStructureV3(device="gpu") # Use device to specify GPU for model inference
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
res.print() ## Print the structured prediction output
res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
For PDF files, each page will be processed individually and generate a separate Markdown file. If you want to convert the entire PDF to a single Markdown file, use the following method:
from pathlib import Path
from paddleocr import PPStructureV3
input_file = "./your_pdf_file.pdf"
output_path = Path("./output")
pipeline = PPStructureV3()
output = pipeline.predict(input=input_file)
markdown_list = []
markdown_images = []
for res in output:
md_info = res.markdown
markdown_list.append(md_info)
markdown_images.append(md_info.get("markdown_images", {}))
markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
with open(mkd_file_path, "w", encoding="utf-8") as f:
f.write(markdown_texts)
for item in markdown_images:
if item:
for path, image in item.items():
file_path = output_path / path
file_path.parent.mkdir(parents=True, exist_ok=True)
image.save(file_path)
Note:
The default text recognition model used by PP-StructureV3 is a Chinese-English recognition model, which has limited accuracy for purely English texts. For English-only scenarios, you can set the text_recognition_model_name parameter to an English model such as en_PP-OCRv4_mobile_rec to achieve better recognition performance. For other languages, refer to the model list above and select the appropriate language recognition model for replacement.
In the example code, the parameters use_doc_orientation_classify, use_doc_unwarping, and use_textline_orientation are all set to False by default. These indicate that document orientation classification, document image unwarping, and textline orientation classification are disabled. You can manually set them to True if needed.
The above Python script performs the following steps:
<details><summary>(1) Instantiate <code>PPStructureV3</code> to create the pipeline object. The parameter descriptions are as follows:</summary> <table> <thead> <tr> <th>Parameter</th> <th>Description</th> <th>Type</th> <th>Default</th> </tr> </thead> <tbody> <tr> <td><code>layout_detection_model_name</code></td> <td><b>Meaning:</b>Name of the layout detection model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the layout detection model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_threshold</code></td> <td><b>Meaning:</b>Score threshold for the layout model.<b>Description:</b>
<ul> <li><b>float</b>: Any float between <code>0-1</code>;</li> <li><b>dict</b>: <code>{0:0.1}</code> where the key is the class ID and the value is the threshold for that class;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default of <code>0.5</code>.</li> </ul> </td> <td><code>float|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_nms</code></td> <td><b>Meaning:</b>Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection.<b>Description:</b> If set to <code>None</code>, the parameter will default to the value initialized in the pipeline, which is set to <code>True</code> by default.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_unclip_ratio</code></td> <td><b>Meaning:</b>Expansion ratio for the bounding boxes from the layout detection model.<b>Description:</b>
<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>Tuple[float,float]</b>: Expansion ratios in horizontal and vertical directions;</li> <li><b>dict</b>: A dictionary with <b>int</b> keys representing <code>cls_id</code>, and <b>tuple</b> values, e.g., <code>{0: (1.1, 2.0)}</code> means width is expanded 1.1× and height 2.0× for class 0 boxes;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default of <code>1.0</code>.</li> </ul> </td> <td><code>float|Tuple[float,float]|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_merge_bboxes_mode</code></td> <td><b>Meaning:</b>Filtering method for overlapping boxes in layout detection.<b>Description:</b>
<ul> <li><b>str</b>: Options include <code>large</code>, <code>small</code>, and <code>union</code> to retain the larger box, smaller box, or both;</li> <li><b>dict</b>: A dictionary with <b>int</b> keys representing <code>cls_id</code>, and <b>str</b> values, e.g., <code>{0: "large", 2: "small"}</code> means using different modes for different classes;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default value <code>large</code>.</li> </ul> </td> <td><code>str|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>chart_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the chart parsing model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>chart_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory path of the chart parsing model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>chart_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size for the chart parsing model.<b>Description:</b> If set to <code>None</code>, the default is <code>1</code>.</td>
<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>region_detection_model_name</code></td> <td><b>Meaning:</b>Name of the region detection model for sub-modules in document layout.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>region_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the region detection model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_orientation_classify_model_name</code></td> <td><b>Meaning:</b>Name of the document orientation classification model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_orientation_classify_model_dir</code></td> <td><b>Meaning:</b>Directory path of the document orientation classification model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_unwarping_model_name</code></td> <td><b>Meaning:</b>Name of the document unwarping model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_unwarping_model_dir</code></td> <td><b>Meaning:</b>Directory path of the document unwarping model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_detection_model_name</code></td> <td><b>Meaning:</b>Name of the text detection model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the text detection model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_limit_side_len</code></td> <td><b>Meaning:</b>Image side length limitation for text detection.<b>Description:</b>
<ul> <li><b>int</b>: Any integer greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default of <code>960</code>.</li> </ul> </td> <td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_limit_type</code></td> <td> <b>Meaning:</b>Limit type for text detection.<b>Description:</b>
<ul> <li><b>str</b>: Supports <code>min</code> and <code>max</code>. <code>min</code> ensures the shortest side is no less than <code>det_limit_side_len</code>, while <code>max</code> ensures the longest side is no greater than <code>limit_side_len</code>;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default of <code>max</code>.</li> </ul> </td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_thresh</code></td> <td><b>Meaning:</b>Pixel threshold for detection. Pixels in the output probability map with scores above this value are considered as text pixels.<b>Description:</b>
<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default value of <code>0.3</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_box_thresh</code></td> <td><b>Meaning:</b>Bounding box threshold. If the average score of all pixels inside the box exceeds this threshold, it is considered a text region.<b>Description:</b>
<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default value of <code>0.6</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_unclip_ratio</code></td> <td><b>Meaning:</b>Expansion ratio for text detection. The larger the value, the more the text region is expanded.<b>Description:</b>
<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default value of <code>2.0</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>textline_orientation_model_name</code></td> <td><b>Meaning:</b>Name of the textline orientation model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>textline_orientation_model_dir</code></td> <td><b>Meaning:</b>Directory path of the textline orientation model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>textline_orientation_batch_size</code></td> <td><b>Meaning:</b>Batch size for the textline orientation model.<b>Description:</b> If set to <code>None</code>, the default batch size is <code>1</code>.</td>
<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the text recognition model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory path of the text recognition model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size for the text recognition model.<b>Description:</b> If set to <code>None</code>, the default batch size is <code>1</code>.</td>
<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_rec_score_thresh</code></td> <td><b>Meaning:</b>Score threshold for text recognition. Only results with scores above this threshold will be retained.<b>Description:</b>
<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default of <code>0.0</code> (no threshold).</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>table_classification_model_name</code></td> <td><b>Meaning:</b>Name of the table classification model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>table_classification_model_dir</code></td> <td><b>Meaning:</b>Directory path of the table classification model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wired_table_structure_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the wired table structure recognition model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wired_table_structure_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory path of the wired table structure recognition model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wireless_table_structure_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the wireless table structure recognition model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wireless_table_structure_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory path of the wireless table structure recognition model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wired_table_cells_detection_model_name</code></td> <td><b>Meaning:</b>Name of the wired table cell detection model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wired_table_cells_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the wired table cell detection model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <tr> <td><code>wireless_table_cells_detection_model_name</code></td> <td><b>Meaning:</b>Name of the wireless table cell detection model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>wireless_table_cells_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the wireless table cell detection model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>table_orientation_classify_model_name</code></td> <td><b>Meaning:</b>Name of the wireless table orientation classification model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>table_orientation_classify_model_dir</code></td> <td><b>Meaning:</b>Directory of the table orientation classification model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_detection_model_name</code></td> <td><b>Meaning:</b>Name of the seal text detection model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the seal text detection model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_limit_side_len</code></td> <td><b>Meaning:</b>Image side length limit for seal text detection.<b>Description:</b>
<ul> <li><b>int</b>: Any integer greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>736</code>.</li> </ul> </td> <td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_limit_type</code></td> <td><b>Meaning:</b>Limit type for seal text detection image side length.<b>Description:</b>
<ul> <li><b>str</b>: Supports <code>min</code> and <code>max</code>. <code>min</code> ensures the shortest side is no less than <code>det_limit_side_len</code>, while <code>max</code> ensures the longest side is no greater than <code>limit_side_len</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>min</code>.</li> </ul> </td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_thresh</code></td> <td><b>Meaning:</b>Pixel threshold for detection. Pixels with scores greater than this value in the probability map are considered text pixels.<b>Description:</b>
<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>0.2</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_box_thresh</code></td> <td><b>Meaning:</b>Bounding box threshold. If the average score of all pixels inside a detection box exceeds this threshold, it is considered a text region.<b>Description:</b>
<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>0.6</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_unclip_ratio</code></td> <td><b>Meaning:</b>Expansion ratio for seal text detection. The larger the value, the larger the expanded area.<b>Description:</b>
<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>0.5</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the seal text recognition model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory path of the seal text recognition model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size for the seal text recognition model.<b>Description:</b> If set to <code>None</code>, the default value is <code>1</code>.</td>
<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_rec_score_thresh</code></td> <td><b>Meaning:</b>Score threshold for seal text recognition. Text results with scores above this threshold will be retained.<b>Description:</b>
<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>0.0</code> (no threshold).</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>formula_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the formula recognition model.<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>formula_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory path of the formula recognition model.<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>formula_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size for the formula recognition model.<b>Description:</b> If set to <code>None</code>, the default value is <code>1</code>.</td>
<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_orientation_classify</code></td> <td><b>Meaning:</b>Whether to enable the document orientation classification module.<b>Description:</b> If set to <code>None</code>, the default value is <code>False</code>.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_unwarping</code></td> <td><b>Meaning:</b>Whether to enable the document image unwarping module.<b>Description:</b> If set to <code>None</code>, the default value is <code>False</code>.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_textline_orientation</code></td> <td><b>Meaning:</b>Whether to use the text line orientation classification.<b>Description:</b> If set to <code>None</code>, the default value is <code>False</code>.</td>
<td><code>bool|None</code></td> <td>None</td> </tr> <tr> <td><code>use_seal_recognition</code></td> <td><b>Meaning:</b>Whether to enable seal text recognition subpipeline.<b>Description:</b> If set to <code>None</code>, the default value is <code>False</code>.</td>
<td><code>bool|None</code></td> <td>None</td> </tr> <tr> <td><code>use_table_recognition</code></td> <td><b>Meaning:</b>Whether to enable table recognition subpipeline.<b>Description:</b> If set to <code>None</code>, the default value is <code>True</code>.</td>
<td><code>bool|None</code></td> <td>None</td> </tr> <tr> <td><code>use_formula_recognition</code></td> <td><b>Meaning:</b>Whether to enable formula recognition subpipeline.<b>Description:</b> If set to <code>None</code>, the default value is <code>True</code>.</td>
<td><code>bool|None</code></td> <td>None</td> </tr> <tr> <td><code>use_chart_recognition</code></td> <td><b>Meaning:</b>Whether to load and use the chart parsing module.<b>Description:</b> If set to <code>None</code>, the default value is <code>False</code>.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_region_detection</code></td> <td><b>Meaning:</b>Whether to load and use the document region detection module.<b>Description:</b> If set to <code>None</code>, the default value is <code>True</code>.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>format_block_content</code></td> <td><b>Meaning:</b>Whether to format the content in <code>block_content</code> as Markdown.<b>Description:</b> If set to <code>None</code>, the default value is <code>False</code>.</td>
<td><code>bool|None</code></td> <td></td> </tr> <tr> <td><code>markdown_ignore_labels</code></td> <td><b>Meaning:</b>Layout tags that need to be ignored in Markdown.<b>Description:</b> If set to <code>None</code>, the default value is <code>['number','footnote','header','header_image','footer','footer_image','aside_text']</code>.</td>
<td><code>list|None</code></td> <td></td> </tr> <tr> <td><code>device</code></td> <td><b>Meaning:</b>Device used for inference.<b>Description:</b> Supports specifying device ID:
<ul> <li><b>CPU</b>: e.g., <code>cpu</code> means using CPU for inference;</li> <li><b>GPU</b>: e.g., <code>gpu:0</code> means using GPU 0;</li> <li><b>NPU</b>: e.g., <code>npu:0</code> means using NPU 0;</li> <li><b>XPU</b>: e.g., <code>xpu:0</code> means using XPU 0;</li> <li><b>MLU</b>: e.g., <code>mlu:0</code> means using MLU 0;</li> <li><b>DCU</b>: e.g., <code>dcu:0</code> means using DCU 0;</li> <li><b>MetaX GPU</b>: e.g., <code>metax_gpu:0</code> means using MetaX GPU 0;</li> <li><b>Iluvatar GPU</b>: e.g., <code>iluvatar_gpu:0</code> means using Iluvatar GPU 0;</li> <li><b>None</b>: If set to <code>None</code>, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.</li> </ul> </td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>engine</code></td> <td><b>Meaning:</b> Inference engine. <b>Description:</b> Supports <code>None</code> (the default), <code>paddle</code>, <code>paddle_static</code>, <code>paddle_dynamic</code>, and <code>transformers</code>. When left as <code>None</code>, PaddleOCR preserves the behavior of earlier versions, which in most configurations is equivalent to <code>paddle</code>. For detailed descriptions, supported values, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>engine_config</code></td> <td><b>Meaning:</b> Inference-engine configuration. <b>Description:</b> Recommended together with <code>engine</code>. For supported fields, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>enable_hpi</code></td> <td><b>Meaning:</b> Whether to enable high-performance inference.</td> <td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_tensorrt</code></td> <td><b>Meaning:</b> Whether to enable the TensorRT subgraph engine of Paddle Inference.<b>Description:</b> If the model does not support TensorRT acceleration, acceleration will not be used even if this flag is set.
For CUDA 11.8 versions of PaddlePaddle, the compatible TensorRT version is 8.x (x>=6). TensorRT 8.6.1.6 is recommended.
</td> <td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>precision</code></td> <td><b>Meaning:</b> Computation precision, such as <code>"fp32"</code> or <code>"fp16"</code>.</td> <td><code>str</code></td> <td><code>"fp32"</code></td> </tr> <tr> <td><code>enable_mkldnn</code></td> <td><b>Meaning:</b> Whether to enable MKL-DNN accelerated inference.<b>Description:</b> If MKL-DNN is unavailable or the model does not support MKL-DNN acceleration, acceleration will not be used even if this flag is set.
</td> <td><code>bool</code></td> <td><code>True</code></td> </tr> <tr> <td><code>mkldnn_cache_capacity</code></td> <td> <b>Meaning:</b> MKL-DNN cache capacity. </td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>cpu_threads</code></td> <td><b>Meaning:</b> Number of threads used for inference on CPU.</td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>paddlex_config</code></td> <td><b>Meaning:</b> Path to the PaddleX pipeline configuration file.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> </tbody> </table> </details> <details><summary>(2) Call the <code>predict()</code> method of the PP-StructureV3 pipeline object for inference. This method returns a result list. The pipeline also provides a <code>predict_iter()</code> method. Both methods accept the same parameters and return the same type of results. The only difference is that <code>predict_iter()</code> returns a <code>generator</code> that allows incremental processing and retrieval of prediction results, which is useful for handling large datasets or saving memory. Choose the method that fits your needs. Below are the parameters of the <code>predict()</code> method:</summary> <table> <thead> <tr> <th>Parameter</th> <th>Description</th> <th>Type</th> <th>Default</th> </tr> </thead> <tr> <td><code>input</code></td> <td><b>Meaning:</b>Input data to be predicted. Required.<b>Description:</b> Supports multiple types:
<ul> <li><b>Python Var</b>: Image data represented as <code>numpy.ndarray</code>;</li> <li><b>str</b>: Local path to image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL</b> to image or PDF, e.g., <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/demo_paper.png">example</a>; <b>directory</b> containing image files, e.g., <code>/root/data/</code> (directories with PDFs are not supported, use full file path for PDFs);</li> <li><b>list</b>: Elements can be any of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"].</code></li> </ul> </td> <td><code>Python Var|str|list</code></td> <td></td> </tr> <tr> <td><code>use_doc_orientation_classify</code></td> <td><b>Meaning:</b>Whether to use document orientation classification during inference.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_unwarping</code></td> <td><b>Meaning:</b>Whether to use document image unwarping during inference.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_textline_orientation</code></td> <td><b>Meaning:</b>Whether to use textline orientation classification during inference.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_seal_recognition</code></td> <td><b>Meaning:</b>Whether to use the seal text recognition sub-pipeline during inference.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_table_recognition</code></td> <td><b>Meaning:</b>Whether to use the table recognition sub-pipeline during inference.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_formula_recognition</code></td> <td><b>Meaning:</b>Whether to use the formula recognition sub-pipeline during inference.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_chart_recognition</code></td> <td><b>Meaning:</b>Whether to use the chart parsing module during inference.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_region_detection</code></td> <td><b>Meaning:</b>Whether to use the document region detection module during inference.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>format_block_content</code></td> <td>Whether to format the content in <code>block_content</code> as Markdown. If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td> <td><code>bool|None</code></td> <td></td> </tr> <tr> <td><code>layout_threshold</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_nms</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_unclip_ratio</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|Tuple[float,float]|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_merge_bboxes_mode</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>str|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_limit_side_len</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_limit_type</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_thresh</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_box_thresh</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_det_unclip_ratio</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_rec_score_thresh</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_limit_side_len</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_limit_type</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_thresh</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_box_thresh</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_unclip_ratio</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_rec_score_thresh</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.<b>Description:</b> If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td>
<td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_wired_table_cells_trans_to_html</code></td> <td><b>Meaning:</b>Whether to enable direct conversion of wired table cell detection results to HTML.<b>Description:</b> If enabled, HTML will be constructed directly based on the geometric relationship of wired table cell detection results.</td>
<td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>use_wireless_table_cells_trans_to_html</code></td> <td><b>Meaning:</b>Whether to enable direct conversion of wireless table cell detection results to HTML.<b>Description:</b> If enabled, HTML will be constructed directly based on the geometric relationship of wireless table cell detection results.</td>
<td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>use_table_orientation_classify</code></td> <td><b>Meaning:</b>Whether to enable table orientation classification.<b>Description:</b> When enabled, it can correct the orientation and correctly complete table recognition if the table in the image is rotated by 90/180/270 degrees.</td>
<td><code>bool</code></td> <td><code>True</code></td> </tr> <tr> <td><code>use_ocr_results_with_table_cells</code></td> <td><b>Meaning:</b>Whether to enable OCR within cell segmentation.<b>Description:</b> When enabled, OCR detection results will be segmented and re-recognized based on cell prediction results to avoid text loss.</td>
<td><code>bool</code></td> <td><code>True</code></td> </tr> <tr> <td><code>use_e2e_wired_table_rec_model</code></td> <td><b>Meaning:</b>Whether to enable end-to-end wired table recognition mode.<b>Description:</b> If enabled, the cell detection model will not be used, and only the table structure recognition model will be used.</td>
<td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>use_e2e_wireless_table_rec_model</code></td> <td><b>Meaning:</b>Whether to enable end-to-end wireless table recognition mode.<b>Description:</b> If enabled, the cell detection model will not be used, and only the table structure recognition model will be used.</td>
<td><code>bool</code></td> <td><code>True</code></td> </tr> <tr> <td><code>markdown_ignore_labels</code></td> <td>Layout tags that need to be ignored in Markdown. If set to <code>None</code>, the instantiation value is used; otherwise, this parameter takes precedence.</td> <td><code>list|None</code></td> <td></td> </tr> </table> </details> <details><summary>(3) Process the prediction results: each prediction result corresponds to a Result object, which supports printing, saving as image, or saving as a <code>json</code> file:</summary> <table> <thead> <tr> <th>Method</th> <th>Description</th> <th>Parameter</th> <th>Type</th> <th>Parameter Description</th> <th>Default</th> </tr> </thead> <tr> <td rowspan="3"><code>print()</code></td> <td rowspan="3">Print result to terminal</td> <td><code>format_json</code></td> <td><code>bool</code></td> <td>Whether to format output as indented <code>JSON</code>.</td> <td><code>True</code></td> </tr> <tr> <td><code>indent</code></td> <td><code>int</code></td> <td>Indentation level to beautify the <code>JSON</code> output. Only effective when <code>format_json=True</code>.</td> <td>4</td> </tr> <tr> <td><code>ensure_ascii</code></td> <td><code>bool</code></td> <td>Whether to escape non-<code>ASCII</code> characters to <code>Unicode</code>. When <code>True</code>, all non-ASCII characters are escaped. When <code>False</code>, original characters are retained. Only effective when <code>format_json=True</code>.</td> <td><code>False</code></td> </tr> <tr> <td rowspan="3"><code>save_to_json()</code></td> <td rowspan="3">Save result as a JSON file</td> <td><code>save_path</code></td> <td><code>str</code></td> <td>Path to save the file. If a directory, the filename will be based on the input type.</td> <td>None</td> </tr> <tr> <td><code>indent</code></td> <td><code>int</code></td> <td>Indentation level for beautified <code>JSON</code> output. Only effective when <code>format_json=True</code>.</td> <td>4</td> </tr> <tr> <td><code>ensure_ascii</code></td> <td><code>bool</code></td> <td>Whether to escape non-<code>ASCII</code> characters to <code>Unicode</code>. Only effective when <code>format_json=True</code>.</td> <td><code>False</code></td> </tr> <tr> <td><code>save_to_img()</code></td> <td>Save intermediate visualization results as PNG image files</td> <td><code>save_path</code></td> <td><code>str</code></td> <td>Path to save the file, supports directory or file path.</td> <td>None</td> </tr> <tr> <td><code>save_to_word()</code></td> <td>Save the layout parsing results as a Word (.docx) format file</td> <td><code>save_path</code></td> <td><code>str</code></td> <td>Path to save the file, supports directory or file path.</td> <td>None</td> </tr> <tr> <td><code>save_to_markdown()</code></td> <td>Save each page of an image or PDF file as a markdown file</td> <td><code>save_path</code></td> <td><code>str</code></td> <td>Path to save the file, supports directory or file path.</td> <td>None</td> </tr> <tr> <td><code>save_to_html()</code></td> <td>Save tables in the file as HTML format</td> <td><code>save_path</code></td> <td><code>str</code></td> <td>Path to save the file, supports directory or file path.</td> <td>None</td> </tr> <tr> <td><code>save_to_xlsx()</code></td> <td>Save tables in the file as XLSX format</td> <td><code>save_path</code></td> <td><code>str</code></td> <td>Path to save the file, supports directory or file path.</td> <td>None</td> </tr> <tr> <td><code>concatenate_markdown_pages()</code></td> <td>Concatenate multiple markdown pages into a single document</td> <td><code>markdown_list</code></td> <td><code>list</code></td> <td>List of markdown data for each page.</td> <td>Returns the merged markdown text and image list.</td> </tr> </table> <ul> <li> Calling <code>print()</code> will print the result to the terminal. Explanation of the printed content:</li> <ul> <li><code>input_path</code>: <code>(str)</code> Input path of the image or PDF to be predicted</li> <li><code>page_index</code>: <code>(Union[int, None])</code> If input is a PDF, indicates the page number; otherwise <code>None</code></li> <li><code>page_count</code>: <code>(Union[int, None])</code> If the input is a PDF file, it indicates the total number of pages in the PDF; otherwise, it is <code>None</code>.</li> <li><code>width</code>: <code>(int)</code> The width of the original input image.</li> <li><code>height</code>: <code>(int)</code> The height of the original input image.</li> <li><code>model_settings</code>: <code>(Dict[str, bool])</code> Model parameters configured for the pipeline</li> <ul> <li><code>use_doc_preprocessor</code>: <code>(bool)</code> Whether to enable document preprocessor sub-pipeline</li> <li><code>use_seal_recognition</code>: <code>(bool)</code> Whether to enable seal text recognition sub-pipeline</li> <li><code>use_table_recognition</code>: <code>(bool)</code> Whether to enable table recognition sub-pipeline</li> <li><code>use_formula_recognition</code>: <code>(bool)</code> Whether to enable formula recognition sub-pipeline</li> <li><code>format_block_content</code>: <code>(bool)</code> Controls whether to format the <code>block_content</code> into Markdown format</li> <li><code>markdown_ignore_labels</code>: <code>(List[str])</code> Labels of layout regions that need to be ignored in Markdown</li> </ul> </li> <li><code>doc_preprocessor_res</code>: <code>(Dict[str, Union[List[float], str]])</code> Document preprocessing result dictionary, only exists if <code>use_doc_preprocessor=True</code></li> <ul> <li><code>input_path</code>: <code>(str)</code> Image path accepted by document preprocessor, <code>None</code> if input is <code>numpy.ndarray</code></li> <li><code>page_index</code>: <code>None</code> since input is <code>numpy.ndarray</code></li> <li><code>model_settings</code>: <code>(Dict[str, bool])</code> Model configuration for the document preprocessor</li> <ul> <li><code>use_doc_orientation_classify</code>: <code>(bool)</code> Whether to enable document orientation classification</li> <li><code>use_doc_unwarping</code>: <code>(bool)</code> Whether to enable image unwarping</li> </ul> <li><code>angle</code>: <code>(int)</code> Predicted angle result if orientation classification is enabled</li> </ul> <li><code>parsing_res_list</code>: <code>(List[Dict])</code> A list of parsing results, where each element is a dictionary. The order of the list is the reading order after parsing.</li> <ul> <li><code>block_bbox</code>: <code>(np.ndarray)</code> The bounding box of the layout area.</li> <li><code>block_label</code>: <code>(str)</code> The label of the layout area, such as <code>text</code>, <code>table</code>, etc.</li> <li><code>block_content</code>: <code>(str)</code> The content within the layout area.</li> <li><code>block_id</code>: <code>(int)</code> The index of the layout area, used to display the layout sorting result.</li> <li><code>block_order</code>: <code>(int)</code> The order of the layout area, used to display the reading order of the layout. For non-ordered parts, the default value is <code>None</code>.</li> </ul> <li><code>overall_ocr_res</code>: <code>(Dict[str, Union[List[str], List[float], numpy.ndarray]])</code> Dictionary of global OCR results</li> <ul> <li><code>input_path</code>: <code>(Union[str, None])</code> OCR sub-pipeline input path; <code>None</code> if input is <code>numpy.ndarray</code></li> <li><code>page_index</code>: <code>None</code> since input is <code>numpy.ndarray</code></li> <li><code>model_settings</code>: <code>(Dict)</code> OCR model configuration</li> <li><code>dt_polys</code>: <code>(List[numpy.ndarray])</code> List of polygons for text detection. Each box is a numpy array with shape (4, 2), dtype int16</li> <li><code>dt_scores</code>: <code>(List[float])</code> Confidence scores for detection boxes</li> <li><code>text_det_params</code>: <code>(Dict[str, Dict[str, int, float]])</code> Text detection module parameters</li> <ul> <li><code>limit_side_len</code>: <code>(int)</code> Side length limit for image preprocessing</li> <li><code>limit_type</code>: <code>(str)</code> Limit processing method</li> <li><code>thresh</code>: <code>(float)</code> Threshold for text pixel classification</li> <li><code>box_thresh</code>: <code>(float)</code> Threshold for text detection boxes</li> <li><code>unclip_ratio</code>: <code>(float)</code> Unclip ratio for expanding boxes</li> <li><code>text_type</code>: <code>(str)</code> Text detection type, currently fixed as "general"</li> </ul> <li><code>text_type</code>: <code>(str)</code> Text detection type, currently fixed as "general"</li> <li><code>textline_orientation_angles</code>: <code>(List[int])</code> Orientation classification results for text lines</li> <li><code>text_rec_score_thresh</code>: <code>(float)</code> Threshold for text recognition filtering</li> <li><code>rec_texts</code>: <code>(List[str])</code> Recognized texts filtered by score threshold</li> <li><code>rec_scores</code>: <code>(List[float])</code> Recognition scores filtered by threshold</li> <li><code>rec_polys</code>: <code>(List[numpy.ndarray])</code> Filtered detection boxes, same format as <code>dt_polys</code></li> </ul> <li><code>formula_res_list</code>: <code>(List[Dict[str, Union[numpy.ndarray, List[float], str]]])</code> List of formula recognition results</li> <ul> <li><code>rec_formula</code>: <code>(str)</code> Recognized formula string</li> <li><code>rec_polys</code>: <code>(numpy.ndarray)</code> Bounding box for the formula, shape (4, 2), dtype int16</li> <li><code>formula_region_id</code>: <code>(int)</code> Region ID of the formula</li> </ul> <li><code>seal_res_list</code>: <code>(List[Dict[str, Union[numpy.ndarray, List[float], str]]])</code> List of seal text recognition results</li> <ul> <li><code>input_path</code>: <code>(str)</code> Input path for the seal image</li> <li><code>page_index</code>: <code>None</code> since input is <code>numpy.ndarray</code></li> <li><code>model_settings</code>: <code>(Dict)</code> Model configuration for seal text recognition</li> <li><code>dt_polys</code>: <code>(List[numpy.ndarray])</code> Seal detection boxes, same format as <code>dt_polys</code></li> <li><code>text_det_params</code>: <code>(Dict[str, Dict[str, int, float]])</code> Detection parameters, same as above</li> <li><code>text_type</code>: <code>(str)</code> Detection type, currently fixed as "seal"</li> <li><code>text_rec_score_thresh</code>: <code>(float)</code> Score threshold for recognition</li> <li><code>rec_texts</code>: <code>(List[str])</code> Recognized texts filtered by score</li> <li><code>rec_scores</code>: <code>(List[float])</code> Recognition scores filtered by threshold</li> <li><code>rec_polys</code>: <code>(List[numpy.ndarray])</code> Filtered seal boxes, same format as <code>dt_polys</code></li> <li><code>rec_boxes</code>: <code>(numpy.ndarray)</code> Rectangle boxes, shape (n, 4), dtype int16</li> </ul> <li><code>table_res_list</code>: <code>(List[Dict[str, Union[numpy.ndarray, List[float], str]]])</code> List of table recognition results</li> <ul> <li><code>cell_box_list</code>: <code>(List[numpy.ndarray])</code> Bounding boxes of table cells</li> <li><code>pred_html</code>: <code>(str)</code> Table as an HTML string</li> <li><code>table_ocr_pred</code>: <code>(Dict)</code> OCR results for the table</li> <ul> <li><code>rec_polys</code>: <code>(List[numpy.ndarray])</code> Detected cell boxes</li> <li><code>rec_texts</code>: <code>(List[str])</code> Recognized texts for cells</li> <li><code>rec_scores</code>: <code>(List[float])</code> Confidence scores for cell recognition</li> <li><code>rec_boxes</code>: <code>(numpy.ndarray)</code> Rectangle boxes for detection, shape (n, 4), dtype int16</li> </ul> </ul> </ul> </li> <li>Calling <code>save_to_json()</code> saves the above content to the specified <code>save_path</code>. If it’s a directory, the saved path will be <code>save_path/{your_img_basename}_res.json</code>. If it’s a file, it saves directly. Numpy arrays are converted to lists since JSON doesn't support them.</li> <li>Calling <code>save_to_img()</code> saves visual results to the specified <code>save_path</code>. If a directory, various visualizations such as layout detection, OCR, and reading order are saved. If a file, only the last image is saved and others are overwritten.</li> <li>Calling <code>save_to_markdown()</code> saves converted markdown files to <code>save_path/{your_img_basename}.md</code>. For PDF input, it's recommended to specify a directory to avoid file overwriting.</li> <li>Calling <code>concatenate_markdown_pages()</code> merges multi-page markdown results from the <code>PP-StructureV3 pipeline</code> into a single document and returns the merged content.</li>Additionally, you can access the prediction results and visual images through the following attributes:
<table> <thead> <tr> <th>Attribute</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><code>json</code></td> <td>Get the prediction result in <code>json</code> format</td> </tr> <tr> <td rowspan="2"><code>img</code></td> <td rowspan="2">Get visualized image results as a <code>dict</code></td> </tr> <tr> </tr> <tr> <td rowspan="3"><code>markdown</code></td> <td rowspan="3">Get markdown results as a <code>dict</code></td> </tr> <tr> </tr> <tr> </tr> </tbody> </table> <ul> <li>The <code>json</code> attribute returns the prediction result as a dictionary, which is consistent with the content saved using the <code>save_to_json()</code> method.</li> <li>The <code>img</code> attribute returns the prediction result as a dictionary. The keys include <code>layout_det_res</code>, <code>overall_ocr_res</code>, <code>text_paragraphs_ocr_res</code>, <code>formula_res_region1</code>, <code>table_cell_img</code>, and <code>seal_res_region1</code>, each corresponding to a visualized <code>Image.Image</code>, object for layout detection, OCR, text paragraph, formula, table, and seal results. If optional modules are not used, the dictionary only contains <code>layout_det_res</code>.</li> <li>The <code>markdown</code> attribute returns the prediction result as a dictionary. The keys include <code>markdown_texts</code>, <code>markdown_images</code>, and <code>page_continuation_flags</code>, where the values represent the markdown text, displayed images (<code>Image.Image</code> objects), and a boolean tuple indicating whether the first and last elements of the current page are paragraph boundaries.</li> </ul> </details>If the pipeline meets your requirements for inference speed and accuracy, you can proceed with development integration or deployment.
If you want to directly use the pipeline in your Python project, refer to the example code in 2.2 Python script mode.
In addition, PaddleOCR provides two other deployment options described in detail below:
🚀 High-Performance Inference: In production environments, many applications have strict performance requirements (especially response speed) to ensure system efficiency and smooth user experience. PaddleOCR offers a high-performance inference option that deeply optimizes model inference and pre/post-processing for significant end-to-end acceleration. For detailed high-performance inference workflow, refer to High Performance Inference.
☁️ Service Deployment: Service-based deployment is common in production. It encapsulates the inference logic as a service, allowing clients to access it via network requests to obtain results. For detailed instructions on service deployment, refer to Service Deployment.
Below is the API reference and multi-language service invocation examples for basic service deployment:
<details><summary>API Reference</summary> <p>For the main operations provided by the service:</p> <ul> <li>The HTTP request method is POST.</li> <li>Both the request body and response body are JSON data (JSON objects).</li> <li>When the request is processed successfully, the response status code is <code>200</code>, and the attributes of the response body are as follows:</li> </ul> <table> <thead> <tr> <th>Name</th> <th>Type</th> <th>Meaning</th> </tr> </thead> <tbody> <tr> <td><code>logId</code></td> <td><code>string</code></td> <td>The UUID of the request.</td> </tr> <tr> <td><code>errorCode</code></td> <td><code>integer</code></td> <td>Error code. Fixed as <code>0</code>.</td> </tr> <tr> <td><code>errorMsg</code></td> <td><code>string</code></td> <td>Error message. Fixed as <code>"Success"</code>.</td> </tr> <tr> <td><code>result</code></td> <td><code>object</code></td> <td>The result of the operation.</td> </tr> </tbody> </table> <ul> <li>When the request is not processed successfully, the attributes of the response body are as follows:</li> </ul> <table> <thead> <tr> <th>Name</th> <th>Type</th> <th>Meaning</th> </tr> </thead> <tbody> <tr> <td><code>logId</code></td> <td><code>string</code></td> <td>The UUID of the request.</td> </tr> <tr> <td><code>errorCode</code></td> <td><code>integer</code></td> <td>Error code. Same as the response status code.</td> </tr> <tr> <td><code>errorMsg</code></td> <td><code>string</code></td> <td>Error message.</td> </tr> </tbody> </table> <p>The main operations provided by the service are as follows:</p> <ul> <li><b><code>infer</code></b></li> </ul> <p>Perform layout parsing.</p> <p><code>POST /layout-parsing</code></p> <ul> <li>The attributes of the request body are as follows:</li> </ul> <table> <thead> <tr> <th>Name</th> <th>Type</th> <th>Meaning</th> <th>Required</th> </tr> </thead> <tbody> <tr> <td><code>file</code></td> <td><code>string</code></td> <td>The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the above file types. By default, for PDF files exceeding 10 pages, only the content of the first 10 pages will be processed.To remove the page limit, please add the following configuration to the pipeline configuration file:
<pre><code>Serving: extra: max_num_input_imgs: null </code></pre></td> <td>Yes</td> </tr> <tr> <td><code>fileType</code></td> <td><code>integer</code>|<code>null</code></td> <td>File type. <code>0</code> represents a PDF file, and <code>1</code> represents an image file. If this attribute is missing from the request body, the file type will be inferred based on the URL.</td> <td>No</td> </tr> <tr> <td><code>useDocOrientationClassify</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>use_doc_orientation_classify</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useDocUnwarping</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>use_doc_unwarping</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useTextlineOrientation</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>use_textline_orientation</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useSealRecognition</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>use_seal_recognition</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useTableRecognition</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>use_table_recognition</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useFormulaRecognition</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>use_formula_recognition</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useChartRecognition</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>use_chart_recognition</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useRegionDetection</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>use_region_detection</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>formatBlockContent</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>format_block_content</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>layoutThreshold</code></td> <td><code>number</code> | <code>object</code> | <code>null</code></td> <td>Please refer to the description of the <code>layout_threshold</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>layoutNms</code></td> <td><code>boolean</code> | <code>null</code></td> <td>Please refer to the description of the <code>layout_nms</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>layoutUnclipRatio</code></td> <td><code>number</code> | <code>array</code> | <code>object</code> | <code>null</code></td> <td>Please refer to the description of the <code>layout_unclip_ratio</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>layoutMergeBboxesMode</code></td> <td><code>string</code> | <code>object</code> | <code>null</code></td> <td>Please refer to the description of the <code>layout_merge_bboxes_mode</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>textDetLimitSideLen</code></td> <td><code>integer</code> | <code>null</code></td> <td>Please refer to the description of the <code>text_det_limit_side_len</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>textDetLimitType</code></td> <td><code>string</code> | <code>null</code></td> <td>Please refer to the description of the <code>text_det_limit_type</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>textDetThresh</code></td> <td><code>number</code> | <code>null</code></td> <td>Please refer to the description of the <code>text_det_thresh</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>textDetBoxThresh</code></td> <td><code>number</code> | <code>null</code></td> <td>Please refer to the description of the <code>text_det_box_thresh</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>textDetUnclipRatio</code></td> <td><code>number</code> | <code>null</code></td> <td>Please refer to the description of the <code>text_det_unclip_ratio</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>textRecScoreThresh</code></td> <td><code>number</code> | <code>null</code></td> <td>Please refer to the description of the <code>text_rec_score_thresh</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>sealDetLimitSideLen</code></td> <td><code>integer</code> | <code>null</code></td> <td>Please refer to the description of the <code>seal_det_limit_side_len</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>sealDetLimitType</code></td> <td><code>string</code> | <code>null</code></td> <td>Please refer to the description of the <code>seal_det_limit_type</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>sealDetThresh</code></td> <td><code>number</code> | <code>null</code></td> <td>Please refer to the description of the <code>seal_det_thresh</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>sealDetBoxThresh</code></td> <td><code>number</code> | <code>null</code></td> <td>Please refer to the description of the <code>seal_det_box_thresh</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>sealDetUnclipRatio</code></td> <td><code>number</code> | <code>null</code></td> <td>Please refer to the description of the <code>seal_det_unclip_ratio</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>sealRecScoreThresh</code></td> <td><code>number</code> | <code>null</code></td> <td>Please refer to the description of the <code>seal_rec_score_thresh</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useWiredTableCellsTransToHtml</code></td> <td><code>boolean</code></td> <td>Please refer to the description of the <code>use_wired_table_cells_trans_to_html</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useWirelessTableCellsTransToHtml</code></td> <td><code>boolean</code></td> <td>Please refer to the description of the <code>use_wireless_table_cells_trans_to_html</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useTableOrientationClassify</code></td> <td><code>boolean</code></td> <td>Please refer to the description of the <code>use_table_orientation_classify</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useOcrResultsWithTableCells</code></td> <td><code>boolean</code></td> <td>Please refer to the description of the <code>use_ocr_results_with_table_cells</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useE2eWiredTableRecModel</code></td> <td><code>boolean</code></td> <td>Please refer to the description of the <code>use_e2e_wired_table_rec_model</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>useE2eWirelessTableRecModel</code></td> <td><code>boolean</code></td> <td>Please refer to the description of the <code>use_e2e_wireless_table_rec_model</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>markdownIgnoreLabels</code></td> <td><code>array</code> | <code>null</code></td> <td>Please refer to the description of the <code>markdown_ignore_labels</code> parameter of the pipeline object's <code>predict</code> method.</td> <td>No</td> </tr> <tr> <td><code>prettifyMarkdown</code></td> <td><code>boolean</code></td> <td>Whether to output beautified Markdown text. The default is <code>true</code>.</td> <td>No</td> </tr> <tr> <td><code>showFormulaNumber</code></td> <td><code>boolean</code></td> <td>Whether to include formula numbers in the output Markdown text. The default is <code>false</code>.</td> <td>No</td> </tr> <tr> <td><code>outputFormats</code></td> <td><code>array</code> | <code>null</code></td> <td>Optional list of extra formats to return. Currently only <code>"docx"</code> is supported.</td> <td>No</td> </tr> <tr> <td><code>visualize</code></td> <td><code>boolean</code> | <code>null</code></td> <td> Whether to return the final visualization image and intermediate images during the processing. <ul style="margin: 0 0 0 1em; padding-left: 0em;"> <li>If <code>true</code> is provided: return images.</li> <li>If <code>false</code> is provided: do not return any images.</li> <li>If this parameter is omitted from the request body, or if <code>null</code> is explicitly passed, the behavior will follow the value of <code>Serving.visualize</code> in the pipeline configuration.</li> </ul>For example, adding the following setting to the pipeline config file:
<pre><code>Serving: visualize: False </code></pre>will disable image return by default. This behavior can be overridden by explicitly setting the <code>visualize</code> parameter in the request.
If neither the request body nor the configuration file is set (If <code>visualize</code> is set to <code>null</code> in the request and not defined in the configuration file), the image is returned by default.
</td> <td>No</td> </tr> </tbody> </table> <ul> <li>When the request is processed successfully, the <code>result</code> in the response body has the following attributes:</li> </ul> <table> <thead> <tr> <th>Name</th> <th>Type</th> <th>Meaning</th> </tr> </thead> <tbody> <tr> <td><code>layoutParsingResults</code></td> <td><code>array</code></td> <td>The layout parsing results. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file.</td> </tr> <tr> <td><code>dataInfo</code></td> <td><code>object</code></td> <td>Information about the input data.</td> </tr> </tbody> </table> <p>Each element in <code>layoutParsingResults</code> is an <code>object</code> with the following attributes:</p> <table> <thead> <tr> <th>Name</th> <th>Type</th> <th>Meaning</th> </tr> </thead> <tbody> <tr> <td><code>prunedResult</code></td> <td><code>object</code></td> <td>A simplified version of the <code>res</code> field in the JSON representation of the result generated by the <code>predict</code> method of the pipeline object, with the <code>input_path</code> and the <code>page_index</code> fields removed.</td> </tr> <tr> <td><code>markdown</code></td> <td><code>object</code></td> <td>The Markdown result.</td> </tr> <tr> <td><code>outputImages</code></td> <td><code>object</code> | <code>null</code></td> <td>See the description of the <code>img</code> attribute of the result of the pipeline prediction. The images are in JPEG format and are Base64-encoded.</td> </tr> <tr> <td><code>inputImage</code></td> <td><code>string</code> | <code>null</code></td> <td>The input image. The image is in JPEG format and is Base64-encoded.</td> </tr> <tr> <td><code>exports</code></td> <td><code>object</code> | <code>null</code></td> <td>Optional additional exports when <code>outputFormats</code> is present—for example, <code>{"docx": {"content": "..."}}</code>, where <code>content</code> is the Base64-encoded file content.</td> </tr> </tbody> </table> <p><code>markdown</code> is an <code>object</code> with the following attributes:</p> <table> <thead> <tr> <th>Name</th> <th>Type</th> <th>Meaning</th> </tr> </thead> <tbody> <tr> <td><code>text</code></td> <td><code>string</code></td> <td>The Markdown text.</td> </tr> <tr> <td><code>images</code></td> <td><code>object</code></td> <td>A key-value pair of relative paths of Markdown images and Base64-encoded images.</td> </tr> <tr> <td><code>isStart</code></td> <td><code>boolean</code></td> <td>Whether the first element on the current page is the start of a segment.</td> </tr> <tr> <td><code>isEnd</code></td> <td><code>boolean</code></td> <td>Whether the last element on the current page is the end of a segment.</td> </tr> </tbody> </table> </details> <details><summary>Multi-language Service Call Examples</summary> <details> <summary>Python</summary> <pre><code class="language-python"> import base64 import requests import pathlib API_URL = "http://localhost:8080/layout-parsing" # Service URL image_path = "./demo.jpg" # Encode the local image with Base64 with open(image_path, "rb") as file: image_bytes = file.read() image_data = base64.b64encode(image_bytes).decode("ascii") payload = { "file": image_data, # Base64-encoded file content or file URL "fileType": 1, # file type, 1 represents image file } # Call the API response = requests.post(API_URL, json=payload) # Process the response data assert response.status_code == 200 result = response.json()["result"] print("\nDetected layout elements:") for i, res in enumerate(result["layoutParsingResults"]): print(res["prunedResult"]) md_dir = pathlib.Path(f"markdown_{i}") md_dir.mkdir(exist_ok=True) (md_dir / "doc.md").write_text(res["markdown"]["text"]) for img_path, img in res["markdown"]["images"].items(): img_path = md_dir / img_path img_path.parent.mkdir(parents=True, exist_ok=True) img_path.write_bytes(base64.b64decode(img)) print(f"Markdown document saved at {md_dir / 'doc.md'}") for img_name, img in res["outputImages"].items(): img_path = f"{img_name}_{i}.jpg" with open(img_path, "wb") as f: f.write(base64.b64decode(img)) print(f"Output image saved at {img_path}") </code></pre></details> <details><summary>C++</summary> <pre><code class="language-cpp">#include <iostream> #include <fstream> #include <vector> #include <string> #include "cpp-httplib/httplib.h" // https://github.com/Huiyicc/cpp-httplib #include "nlohmann/json.hpp" // https://github.com/nlohmann/json #include "base64.hpp" // https://github.com/tobiaslocker/base64 int main() { httplib::Client client("localhost", 8080); const std::string filePath = "./demo.jpg"; std::ifstream file(filePath, std::ios::binary | std::ios::ate); if (!file) { std::cerr << "Error opening file: " << filePath << std::endl; return 1; } std::streamsize size = file.tellg(); file.seekg(0, std::ios::beg); std::vector<char> buffer(size); if (!file.read(buffer.data(), size)) { std::cerr << "Error reading file." << std::endl; return 1; } std::string bufferStr(buffer.data(), static_cast<size_t>(size)); std::string encodedFile = base64::to_base64(bufferStr); nlohmann::json jsonObj; jsonObj["file"] = encodedFile; jsonObj["fileType"] = 1; auto response = client.Post("/layout-parsing", jsonObj.dump(), "application/json"); if (response && response->status == 200) { nlohmann::json jsonResponse = nlohmann::json::parse(response->body); auto result = jsonResponse["result"]; if (!result.is_object() || !result.contains("layoutParsingResults")) { std::cerr << "Unexpected response format." << std::endl; return 1; } const auto& results = result["layoutParsingResults"]; for (size_t i = 0; i < results.size(); ++i) { const auto& res = results[i]; if (res.contains("prunedResult")) { std::cout << "Layout result [" << i << "]: " << res["prunedResult"].dump() << std::endl; } if (res.contains("outputImages") && res["outputImages"].is_object()) { for (auto& [imgName, imgBase64] : res["outputImages"].items()) { std::string outputPath = imgName + "_" + std::to_string(i) + ".jpg"; std::string decodedImage = base64::from_base64(imgBase64.get<std::string>()); std::ofstream outFile(outputPath, std::ios::binary); if (outFile.is_open()) { outFile.write(decodedImage.c_str(), decodedImage.size()); outFile.close(); std::cout << "Saved image: " << outputPath << std::endl; } else { std::cerr << "Failed to save image: " << outputPath << std::endl; } } } } } else { std::cerr << "Request failed." << std::endl; if (response) { std::cerr << "HTTP status: " << response->status << std::endl; std::cerr << "Response body: " << response->body << std::endl; } return 1; } return 0; } </code></pre></details> <details><summary>Java</summary> <pre><code class="language-java">import okhttp3.*; import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.node.ObjectNode; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.util.Base64; public class Main { public static void main(String[] args) throws IOException { String API_URL = "http://localhost:8080/layout-parsing"; String imagePath = "./demo.jpg"; File file = new File(imagePath); byte[] fileContent = java.nio.file.Files.readAllBytes(file.toPath()); String base64Image = Base64.getEncoder().encodeToString(fileContent); ObjectMapper objectMapper = new ObjectMapper(); ObjectNode payload = objectMapper.createObjectNode(); payload.put("file", base64Image); payload.put("fileType", 1); OkHttpClient client = new OkHttpClient(); MediaType JSON = MediaType.get("application/json; charset=utf-8"); RequestBody body = RequestBody.create(JSON, payload.toString()); Request request = new Request.Builder() .url(API_URL) .post(body) .build(); try (Response response = client.newCall(request).execute()) { if (response.isSuccessful()) { String responseBody = response.body().string(); JsonNode root = objectMapper.readTree(responseBody); JsonNode result = root.get("result"); JsonNode layoutParsingResults = result.get("layoutParsingResults"); for (int i = 0; i < layoutParsingResults.size(); i++) { JsonNode item = layoutParsingResults.get(i); int finalI = i; JsonNode prunedResult = item.get("prunedResult"); System.out.println("Pruned Result [" + i + "]: " + prunedResult.toString()); JsonNode outputImages = item.get("outputImages"); outputImages.fieldNames().forEachRemaining(imgName -> { try { String imgBase64 = outputImages.get(imgName).asText(); byte[] imgBytes = Base64.getDecoder().decode(imgBase64); String imgPath = imgName + "_" + finalI + ".jpg"; try (FileOutputStream fos = new FileOutputStream(imgPath)) { fos.write(imgBytes); System.out.println("Saved image: " + imgPath); } } catch (IOException e) { System.err.println("Failed to save image: " + e.getMessage()); } }); } } else { System.err.println("Request failed with HTTP code: " + response.code()); } } } } </code></pre></details> <details><summary>Go</summary> <pre><code class="language-go">package main import ( "bytes" "encoding/base64" "encoding/json" "fmt" "io/ioutil" "net/http" "os" "path/filepath" ) func main() { API_URL := "http://localhost:8080/layout-parsing" filePath := "./demo.jpg" fileBytes, err := ioutil.ReadFile(filePath) if err != nil { fmt.Printf("Error reading file: %v\n", err) return } fileData := base64.StdEncoding.EncodeToString(fileBytes) payload := map[string]interface{}{ "file": fileData, "fileType": 1, } payloadBytes, err := json.Marshal(payload) if err != nil { fmt.Printf("Error marshaling payload: %v\n", err) return } client := &http.Client{} req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes)) if err != nil { fmt.Printf("Error creating request: %v\n", err) return } req.Header.Set("Content-Type", "application/json") res, err := client.Do(req) if err != nil { fmt.Printf("Error sending request: %v\n", err) return } defer res.Body.Close() if res.StatusCode != http.StatusOK { fmt.Printf("Unexpected status code: %d\n", res.StatusCode) return } body, err := ioutil.ReadAll(res.Body) if err != nil { fmt.Printf("Error reading response: %v\n", err) return } type Markdown struct { Text string `json:"text"` Images map[string]string `json:"images"` } type LayoutResult struct { PrunedResult map[string]interface{} `json:"prunedResult"` Markdown Markdown `json:"markdown"` OutputImages map[string]string `json:"outputImages"` InputImage *string `json:"inputImage"` } type Response struct { Result struct { LayoutParsingResults []LayoutResult `json:"layoutParsingResults"` DataInfo interface{} `json:"dataInfo"` } `json:"result"` } var respData Response if err := json.Unmarshal(body, &respData); err != nil { fmt.Printf("Error parsing response: %v\n", err) return } for i, res := range respData.Result.LayoutParsingResults { fmt.Printf("Result %d - prunedResult: %+v\n", i, res.PrunedResult) mdDir := fmt.Sprintf("markdown_%d", i) os.MkdirAll(mdDir, 0755) mdFile := filepath.Join(mdDir, "doc.md") if err := os.WriteFile(mdFile, []byte(res.Markdown.Text), 0644); err != nil { fmt.Printf("Error writing markdown file: %v\n", err) } else { fmt.Printf("Markdown document saved at %s\n", mdFile) } for path, imgBase64 := range res.Markdown.Images { fullPath := filepath.Join(mdDir, path) os.MkdirAll(filepath.Dir(fullPath), 0755) imgBytes, err := base64.StdEncoding.DecodeString(imgBase64) if err != nil { fmt.Printf("Error decoding markdown image: %v\n", err) continue } if err := os.WriteFile(fullPath, imgBytes, 0644); err != nil { fmt.Printf("Error saving markdown image: %v\n", err) } } for name, imgBase64 := range res.OutputImages { imgBytes, err := base64.StdEncoding.DecodeString(imgBase64) if err != nil { fmt.Printf("Error decoding output image %s: %v\n", name, err) continue } filename := fmt.Sprintf("%s_%d.jpg", name, i) if err := os.WriteFile(filename, imgBytes, 0644); err != nil { fmt.Printf("Error saving output image %s: %v\n", filename, err) } else { fmt.Printf("Output image saved at %s\n", filename) } } } } </code></pre></details> <details><summary>C#</summary> <pre><code class="language-csharp">using System; using System.IO; using System.Net.Http; using System.Text; using System.Threading.Tasks; using Newtonsoft.Json.Linq; class Program { static readonly string API_URL = "http://localhost:8080/layout-parsing"; static readonly string inputFilePath = "./demo.jpg"; static async Task Main(string[] args) { var httpClient = new HttpClient(); byte[] fileBytes = File.ReadAllBytes(inputFilePath); string fileData = Convert.ToBase64String(fileBytes); var payload = new JObject { { "file", fileData }, { "fileType", 1 } }; var content = new StringContent(payload.ToString(), Encoding.UTF8, "application/json"); HttpResponseMessage response = await httpClient.PostAsync(API_URL, content); response.EnsureSuccessStatusCode(); string responseBody = await response.Content.ReadAsStringAsync(); JObject jsonResponse = JObject.Parse(responseBody); JArray layoutParsingResults = (JArray)jsonResponse["result"]["layoutParsingResults"]; for (int i = 0; i < layoutParsingResults.Count; i++) { var res = layoutParsingResults[i]; Console.WriteLine($"[{i}] prunedResult:\n{res["prunedResult"]}"); JObject outputImages = res["outputImages"] as JObject; if (outputImages != null) { foreach (var img in outputImages) { string imgName = img.Key; string base64Img = img.Value?.ToString(); if (!string.IsNullOrEmpty(base64Img)) { string imgPath = $"{imgName}_{i}.jpg"; byte[] imageBytes = Convert.FromBase64String(base64Img); File.WriteAllBytes(imgPath, imageBytes); Console.WriteLine($"Output image saved at {imgPath}"); } } } } } } </code></pre></details> <details><summary>Node.js</summary> <pre><code class="language-js">const axios = require('axios'); const fs = require('fs'); const path = require('path'); const API_URL = 'http://localhost:8080/layout-parsing'; const imagePath = './demo.jpg'; const fileType = 1; function encodeImageToBase64(filePath) { const bitmap = fs.readFileSync(filePath); return Buffer.from(bitmap).toString('base64'); } const payload = { file: encodeImageToBase64(imagePath), fileType: fileType }; axios.post(API_URL, payload) .then(response => { const results = response.data.result.layoutParsingResults; results.forEach((res, index) => { console.log(`\n[${index}] prunedResult:`); console.log(res.prunedResult); const outputImages = res.outputImages; if (outputImages) { Object.entries(outputImages).forEach(([imgName, base64Img]) => { const imgPath = `${imgName}_${index}.jpg`; fs.writeFileSync(imgPath, Buffer.from(base64Img, 'base64')); console.log(`Output image saved at ${imgPath}`); }); } else { console.log(`[${index}] No outputImages.`); } }); }) .catch(error => { console.error('Error during API request:', error.message || error); }); </code></pre></details> <details><summary>PHP</summary> <pre><code class="language-php"><?php $API_URL = "http://localhost:8080/layout-parsing"; $image_path = "./demo.jpg"; $image_data = base64_encode(file_get_contents($image_path)); $payload = array("file" => $image_data, "fileType" => 1); $ch = curl_init($API_URL); curl_setopt($ch, CURLOPT_POST, true); curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload)); curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json')); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $response = curl_exec($ch); curl_close($ch); $result = json_decode($response, true)["result"]["layoutParsingResults"]; foreach ($result as $i => $item) { echo "[$i] prunedResult:\n"; print_r($item["prunedResult"]); if (!empty($item["outputImages"])) { foreach ($item["outputImages"] as $img_name => $img_base64) { $output_image_path = "{$img_name}_{$i}.jpg"; file_put_contents($output_image_path, base64_decode($img_base64)); echo "Output image saved at $output_image_path\n"; } } else { echo "No outputImages found for item $i\n"; } } ?> </code></pre></details> </details>If the default model weights provided by the PP-StructureV3 pipeline do not meet your accuracy or speed requirements in your scenario, you can try fine-tuning the existing model using your own domain-specific or application-specific data to improve the performance of the PP-StructureV3 pipeline for your use case.
Since the PP-StructureV3 pipeline contains multiple modules, unsatisfactory results may originate from any individual module. You can analyze the problematic cases with poor extraction performance, visualize the images, identify the specific module causing the issue, and then refer to the fine-tuning tutorials linked in the table below to perform model fine-tuning.
<table> <thead> <tr> <th>Scenario</th> <th>Fine-tuning Module</th> <th>Fine-tuning Reference Link</th> </tr> </thead> <tbody> <tr> <td>Inaccurate layout detection, such as missing seals or tables</td> <td>Layout Detection Module</td> <td><a href="https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html#iv-custom-development">Link</a></td> </tr> <tr> <td>Inaccurate table structure recognition</td> <td>Table Structure Recognition Module</td> <td><a href="https://paddlepaddle.github.io/PaddleOCR/main/en/version3.x/module_usage/table_structure_recognition.html#4-secondary-development">Link</a></td> </tr> <tr> <td>Inaccurate formula recognition</td> <td>Formula Recognition Module</td> <td><a href="https://paddlepaddle.github.io/PaddleOCR/main/en/version3.x/module_usage/formula_recognition.html#iv-custom-development">Link</a></td> </tr> <tr> <td>Missing seal text detection</td> <td>Seal Text Detection Module</td> <td><a href="https://paddlepaddle.github.io/PaddleOCR/main/en/version3.x/module_usage/seal_text_detection.html#iv-custom-development">Link</a></td> </tr> <tr> <td>Missing text detection</td> <td>Text Detection Module</td> <td><a href="https://paddlepaddle.github.io/PaddleOCR/main/en/version3.x/module_usage/text_detection.html#4-custom-development">Link</a></td> </tr> <tr> <td>Incorrect text recognition results</td> <td>Text Recognition Module</td> <td><a href="https://paddlepaddle.github.io/PaddleOCR/main/en/version3.x/module_usage/text_recognition.html#v-secondary-development">Link</a></td> </tr> <tr> <td>Incorrect correction of vertical or rotated text lines</td> <td>Text Line Orientation Classification Module</td> <td><a href="https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/textline_orientation_classification.html#iv-custom-development">Link</a></td> </tr> <tr> <td>Incorrect correction of full image orientation</td> <td>Document Image Orientation Classification Module</td> <td><a href="https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.html#iv-custom-development">Link</a></td> </tr> <tr> <td>Inaccurate image distortion correction</td> <td>Text Image Correction Module</td> <td>Fine-tuning not supported yet</td> </tr> </tbody> </table>Once you have completed fine-tuning with your private dataset, you will obtain the local model weights. You can then use these fine-tuned weights by customizing the pipeline configuration file.
You can call the export_paddlex_config_to_yaml method of the PPStructureV3 object in PaddleOCR to export the current pipeline configuration as a YAML file:
from paddleocr import PPStructureV3
pipeline = PPStructureV3()
pipeline.export_paddlex_config_to_yaml("PP-StructureV3.yaml")
......
SubModules:
LayoutDetection:
module_name: layout_detection
model_name: PP-DocLayout_plus-L
model_dir: null # Replace with the path to the fine-tuned layout detection model weights
......
SubPipelines:
GeneralOCR:
pipeline_name: OCR
text_type: general
use_doc_preprocessor: False
use_textline_orientation: False
SubModules:
TextDetection:
module_name: text_detection
model_name: PP-OCRv5_server_det
model_dir: null # Replace with the path to the fine-tuned text detection model weights
limit_side_len: 960
limit_type: max
max_side_limit: 4000
thresh: 0.3
box_thresh: 0.6
unclip_ratio: 1.5
TextRecognition:
module_name: text_recognition
model_name: PP-OCRv5_server_rec
model_dir: null # Replace with the path to the fine-tuned text recognition model weights
batch_size: 1
score_thresh: 0
......
The pipeline configuration file not only includes parameters supported by the PaddleOCR CLI and Python API but also allows for more advanced configurations. For more details, refer to the corresponding pipeline usage tutorial in the PaddleX Pipeline Usage Overview, and adjust the configurations as needed based on your requirements.
After modifying the configuration file, specify the updated pipeline configuration path using the --paddlex_config parameter in the command line. PaddleOCR will load its content as the pipeline configuration. Example:
paddleocr pp_structurev3 --paddlex_config PP-StructureV3.yaml ...
paddlex_config parameter. PaddleOCR will load its content as the pipeline configuration. Example:from paddleocr import PPStructureV3
pipeline = PPStructureV3(paddlex_config="PP-StructureV3.yaml")