Seal Text Recognition Pipeline Usage Tutorial

1. Introduction to Seal Text Recognition Pipeline

Seal text recognition is a technology that automatically extracts and recognizes the content of seals from documents or images. The recognition of seal text is part of document processing and has many applications in various scenarios, such as contract comparison, warehouse entry and exit review, and invoice reimbursement review.

The seal text recognition pipeline is used to recognize the text content of seals, extracting the text information from seal images and outputting it in text form. This pipeline integrates the industry-renowned end-to-end OCR system PP-OCRv4, supporting the detection and recognition of curved seal text. Additionally, this pipeline integrates an optional layout region localization module, which can accurately locate the layout position of the seal within the entire document. It also includes optional document image orientation correction and distortion correction functions. Based on this pipeline, millisecond-level accurate text content prediction can be achieved on a CPU. This pipeline also provides flexible service deployment methods, supporting the use of multiple programming languages on various hardware. Moreover, it offers custom development capabilities, allowing you to train and fine-tune on your own dataset based on this pipeline, and the trained model can be seamlessly integrated.

<b>The seal text recognition</b> pipeline includes a seal text detection module and a text recognition module, as well as optional layout detection module, document image orientation classification module, and text image correction module.

Seal Text Detection Module
Text Recognition Module
Layout Detection Module (Optional)
Document Image Orientation Classification Module (Optional)
Text Image Unwarping Module (Optional)

In this pipeline, you can choose the model to use based on the benchmark data below.

The inference time only includes the model inference time and does not include the time for pre- or post-processing. In the inference time columns labeled [Regular Mode / High-Performance Mode], the Regular Mode values correspond to the local paddle_static inference engine.

<details> <summary> <b>Layout Region Detection Module (Optional):</b></summary>

<b>Layout detection model, including 20 common categories: document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, table, figure and table title (figure title, table title, and chart title), seal, chart, sidebar text, and reference content</b>

<b>Layout detection model, including 23 common categories: document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, chart title, table, table title, seal, chart title, chart, header image, footer image, sidebar text</b>

❗ Listed above are the <b>4 core models</b> that are the focus of the layout detection module, which supports a total of <b>13 full models</b>, including multiple models with pre-defined different categories, among which 9 models include the seal category. Apart from the 3 core models mentioned above, the remaining models are as follows:

<details><summary> 👉Details of the Model List</summary>

<b>3-class layout detection model, including table, image, seal</b>

<b>17-class region detection model, including 17 common layout categories: paragraph title, image, text, number, abstract, content, chart title, formula, table, table title, references, document title, footnote, header, algorithm, footer, seal</b>

<table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>mAP(0.5) (%)</th> <th>GPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>PicoDet-S_layout_17cls</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PicoDet-S_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_17cls_pretrained.pdparams">Training Model</a></td> <td>87.4</td> <td>8.80 / 3.62</td> <td>17.51 / 6.35</td> <td>4.8</td> <td>A highly efficient layout region localization model based on the lightweight PicoDet-S model trained on a self-built dataset including Chinese and English papers, magazines, and research reports</td> </tr> <tr> <td>PicoDet-L_layout_17cls</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PicoDet-L_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_17cls_pretrained.pdparams">Training Model</a></td> <td>89.0</td> <td>12.60 / 10.27</td> <td>43.70 / 24.42</td> <td>22.6</td> <td>An efficiency-accuracy balanced layout region localization model based on PicoDet-L trained on a self-built dataset including Chinese and English papers, magazines, and research reports</td> </tr> <tr> <td>RT-DETR-H_layout_17cls</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/RT-DETR-H_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">Training Model</a></td> <td>98.3</td> <td>115.29 / 101.18</td> <td>964.75 / 964.75</td> <td>470.2</td> <td>A high precision layout region localization model based on RT-DETR-H trained on a self-built dataset including Chinese and English papers, magazines, and research reports</td> </tr> </tbody> </table> </details> </details> <details> <summary> <b>Document Image Orientation Classification Module (Optional):</b></summary> <table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>Top-1 Acc (%)</th> <th>GPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>PP-LCNet_x1_0_doc_ori</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-LCNet_x1_0_doc_ori_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x1_0_doc_ori_pretrained.pdparams">Training Model</a></td> <td>99.06</td> <td>2.62 / 0.59</td> <td>3.24 / 1.19</td> <td>7</td> <td>A document image classification model based on PP-LCNet_x1_0, containing four categories: 0 degrees, 90 degrees, 180 degrees, and 270 degrees</td> </tr> </tbody> </table> </details> <details> <summary> <b>Text Image Correction Module (Optional):</b></summary> <table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>CER</th> <th>GPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Normal Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>UVDoc</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/UVDoc_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UVDoc_pretrained.pdparams">Training Model</a></td> <td>0.179</td> <td>19.05 / 19.05</td> <td>- / 869.82</td> <td>30.3</td> <td>A high precision text image correction model</td> </tr> </tbody> </table> </details> <details> <summary> <b>Seal Text Detection Module:</b></summary> <table> <thead> <tr> <th>Model</th><th>Model Download Link</th> <th>Detection Hmean (%)</th> <th>GPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>PP-OCRv4_server_seal_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_server_seal_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_seal_det_pretrained.pdparams">Training Model</a></td> <td>98.40</td> <td>124.64 / 91.57</td> <td>545.68 / 439.86</td> <td>109</td> <td>PP-OCRv4 server-side seal text detection model, with higher accuracy, suitable for deployment on better servers</td> </tr> <tr> <td>PP-OCRv4_mobile_seal_det</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_mobile_seal_det_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_seal_det_pretrained.pdparams">Training Model</a></td> <td>96.36</td> <td>9.70 / 3.56</td> <td>50.38 / 19.64</td> <td>4.7</td> <td>PP-OCRv4 mobile-side seal text detection model, with higher efficiency, suitable for deployment on the edge</td> </tr> </tbody> </table> </details> <details> <summary><b>Text Recognition Module:</b></summary> <table> <tr> <th>Model</th><th>Model Download Link</th> <th>Recognition Avg Accuracy(%)</th> <th>GPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>PP-OCRv5_server_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ PP-OCRv5_server_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_server_rec_pretrained.pdparams">Training Model</a></td> <td>86.38</td> <td>8.46 / 2.36</td> <td>31.21 / 31.21</td> <td>81</td> <td rowspan="2">PP-OCRv5_rec is a new generation text recognition model. This model aims to efficiently and accurately support the recognition of four major languages: Simplified Chinese, Traditional Chinese, English, and Japanese, as well as complex text scenes like handwriting, vertical text, pinyin, and rare characters with a single model. It balances recognition effectiveness, inference speed, and model robustness, providing efficient and accurate technical support for document understanding in various scenarios.</td> </tr> <tr> <td>PP-OCRv5_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ PP-OCRv5_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>81.29</td> <td>5.43 / 1.46</td> <td>21.20 / 5.32</td> <td>16</td> </tr> <tr> <td>PP-OCRv4_server_rec_doc</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ PP-OCRv4_server_rec_doc_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_doc_pretrained.pdparams">Training Model</a></td> <td>86.58</td> <td>8.69 / 2.78</td> <td>37.93 / 37.93</td> <td>182</td> <td>PP-OCRv4_server_rec_doc is trained on a mix of more Chinese document data and PP-OCR training data based on PP-OCRv4_server_rec, enhancing recognition capabilities for some traditional Chinese characters, Japanese, and special characters, supporting over 15,000+ characters. Besides improving document-related text recognition, it also enhances general text recognition capabilities</td> </tr> <tr> <td>PP-OCRv4_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>78.74</td> <td>5.26 / 1.12</td> <td>17.48 / 3.61</td> <td>10.5</td> <td>PP-OCRv4 lightweight recognition model, with high inference efficiency, can be deployed on multiple hardware devices, including edge devices</td> </tr> <tr> <td>PP-OCRv4_server_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv4_server_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_pretrained.pdparams">Training Model</a></td> <td>85.19</td> <td>8.75 / 2.49</td> <td>36.93 / 36.93</td> <td>173</td> <td>PP-OCRv4 server-side model, with high inference accuracy, can be deployed on various servers</td> </tr> <tr> <td>en_PP-OCRv4_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ en_PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/en_PP-OCRv4_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>70.39</td> <td>4.81 / 1.23</td> <td>17.20 / 4.18</td> <td>7.5</td> <td>An ultra-lightweight English recognition model trained based on the PP-OCRv4 recognition model, supporting English and number recognition</td> </tr> </table>

❗ Listed above are the <b>6 core models</b> that are the focus of the text recognition module, which supports a total of <b>20 full models</b>, including multiple multi-language text recognition models, with the complete model list as follows:

<details><summary> 👉Details of the Model List</summary>

<b>PP-OCRv5 Multi-Scene Model</b>

<table> <tr> <th>Model</th><th>Model Download Link</th> <th>Chinese Recognition Avg Accuracy(%)</th> <th>English Recognition Avg Accuracy(%)</th> <th>Traditional Chinese Recognition Avg Accuracy(%)</th> <th>Japanese Recognition Avg Accuracy(%)</th> <th>GPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>PP-OCRv5_server_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ PP-OCRv5_server_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_server_rec_pretrained.pdparams">Training Model</a></td> <td>86.38</td> <td>64.70</td> <td>93.29</td> <td>60.35</td> <td>8.46 / 2.36</td> <td>31.21 / 31.21</td> <td>81</td> <td rowspan="2">PP-OCRv5_rec is a new generation text recognition model. This model aims to efficiently and accurately support the recognition of four major languages: Simplified Chinese, Traditional Chinese, English, and Japanese, as well as complex text scenes like handwriting, vertical text, pinyin, and rare characters with a single model. It balances recognition effectiveness, inference speed, and model robustness, providing efficient and accurate technical support for document understanding in various scenarios.</td> </tr> <tr> <td>PP-OCRv5_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ PP-OCRv5_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>81.29</td> <td>66.00</td> <td>83.55</td> <td>54.65</td> <td>5.43 / 1.46</td> <td>21.20 / 5.32</td> <td>16</td> </tr> </table>

<b>Chinese Recognition Model</b>

<b>English Recognition Model</b>

<b>Multilingual Recognition Model</b>

<table> <tr> <th>Model</th><th>Model Download Link</th> <th>Recognition Avg Accuracy(%)</th> <th>GPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>CPU Inference Time (ms) [Regular Mode / High-Performance Mode]</th> <th>Model Storage Size (MB)</th> <th>Description</th> </tr> <tr> <td>korean_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ korean_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/korean_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>60.21</td> <td>3.73 / 0.98</td> <td>8.76 / 2.91</td> <td>9.6</td> <td>An ultra-lightweight Korean recognition model trained based on the PP-OCRv3 recognition model, supporting Korean and number recognition</td> </tr> <tr> <td>japan_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ japan_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/japan_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>45.69</td> <td>3.86 / 1.01</td> <td>8.62 / 2.92</td> <td>9.8</td> <td>An ultra-lightweight Japanese recognition model trained based on the PP-OCRv3 recognition model, supporting Japanese and number recognition</td> </tr> <tr> <td>chinese_cht_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ chinese_cht_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/chinese_cht_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>82.06</td> <td>3.90 / 1.16</td> <td>9.24 / 3.18</td> <td>10.8</td> <td>An ultra-lightweight Traditional Chinese recognition model trained based on the PP-OCRv3 recognition model, supporting Traditional Chinese and number recognition</td> </tr> <tr> <td>te_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ te_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/te_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>95.88</td> <td>3.59 / 0.81</td> <td>8.28 / 6.21</td> <td>8.7</td> <td>An ultra-lightweight Telugu recognition model trained based on the PP-OCRv3 recognition model, supporting Telugu and number recognition</td> </tr> <tr> <td>ka_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ ka_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ka_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>96.96</td> <td>3.49 / 0.89</td> <td>8.63 / 2.77</td> <td>17.4</td> <td>An ultra-lightweight Kannada recognition model trained based on the PP-OCRv3 recognition model, supporting Kannada and number recognition</td> </tr> <tr> <td>ta_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ ta_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ta_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>76.83</td> <td>3.49 / 0.86</td> <td>8.35 / 3.41</td> <td>8.7</td> <td>An ultra-lightweight Tamil recognition model trained based on the PP-OCRv3 recognition model, supporting Tamil and number recognition</td> </tr> <tr> <td>latin_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ latin_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/latin_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>76.93</td> <td>3.53 / 0.78</td> <td>8.50 / 6.83</td> <td>8.7</td> <td>An ultra-lightweight Latin recognition model trained based on the PP-OCRv3 recognition model, supporting Latin and number recognition</td> </tr> <tr> <td>arabic_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ arabic_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/arabic_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>73.55</td> <td>3.60 / 0.83</td> <td>8.44 / 4.69</td> <td>17.3</td> <td>An ultra-lightweight Arabic letter recognition model trained based on the PP-OCRv3 recognition model, supporting Arabic letters and number recognition</td> </tr> <tr> <td>cyrillic_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ cyrillic_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/cyrillic_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>94.28</td> <td>3.56 / 0.79</td> <td>8.22 / 2.76</td> <td>8.7</td> <td>An ultra-lightweight Cyrillic letter recognition model trained based on the PP-OCRv3 recognition model, supporting Cyrillic letters and number recognition</td> </tr> <tr> <td>devanagari_PP-OCRv3_mobile_rec</td> <td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/\ devanagari_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/devanagari_PP-OCRv3_mobile_rec_pretrained.pdparams">Training Model</a></td> <td>96.44</td> <td>3.60 / 0.78</td> <td>6.95 / 2.87</td> <td>8.7</td> <td>An ultra-lightweight Devanagari letter recognition model trained based on the PP-OCRv3 recognition model, supporting Devanagari letters and number recognition</td> </tr> </table> </details> </details> <details> <summary> <b>Test Environment Description:</b></summary> <ul> <li><b>Performance Test Environment</b> <ul> <li><strong>Test Dataset: </strong> <ul> <li>Document Image Orientation Classification Model: Self-built internal dataset covering multiple scenarios such as documents and certificates, containing 1000 images.</li> <li>Text Image Correction Model: <a href="https://www3.cs.stonybrook.edu/~cvl/docunet.html">DocUNet</a>.</li> <li>Layout Region Detection Model: PaddleOCR self-built layout region detection dataset, containing 500 common document type images such as Chinese and English papers, magazines, contracts, books, exam papers, and research reports.</li> <li>3-Class Layout Detection Model: PaddleOCR self-built layout region detection dataset, containing 1154 common document type images such as Chinese and English papers, magazines, and research reports.</li> <li>17-Class Region Detection Model: PaddleOCR self-built layout region detection dataset, containing 892 common document type images such as Chinese and English papers, magazines, and research reports.</li> <li>Text Detection Model: PaddleOCR self-built Chinese dataset covering multiple scenarios such as street scenes, web images, documents, and handwriting, where detection includes 500 images.</li> <li>Chinese Recognition Model: PaddleOCR self-built Chinese dataset covering multiple scenarios such as street scenes, web images, documents, and handwriting, where text recognition includes 11,000 images.</li> <li>ch_SVTRv2_rec: <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition</a> A leaderboard evaluation set.</li> <li>ch_RepSVTR_rec: <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition</a> B leaderboard evaluation set.</li> <li>English Recognition Model: Self-built internal English dataset.</li> <li>Multilingual Recognition Model: Self-built internal multilingual dataset.</li> <li>Text Line Orientation Classification Model: Self-built internal dataset covering multiple scenarios such as documents and certificates, containing 1000 images.</li> <li>Seal Text Detection Model: Self-built internal dataset containing 500 circular seal images.</li> </ul> </li> <li><strong>Hardware Configuration:</strong> <ul> <li>GPU: NVIDIA Tesla T4</li> <li>CPU: Intel Xeon Gold 6271C @ 2.60GHz</li> </ul> </li> <li><strong>Software Environment:</strong> <ul> <li>Ubuntu 20.04 / CUDA 11.8 / cuDNN 8.9 / TensorRT 8.6.1.6</li> <li>paddlepaddle-gpu 3.0.0 / paddleocr 3.0.3</li> </ul> </li> </ul> </li> <li><b>Inference Mode Description</b></li> </ul> <table border="1"> <thead> <tr> <th>Mode</th> <th>GPU Configuration</th> <th>CPU Configuration</th> <th>Acceleration Technology Combination</th> </tr> </thead> <tbody> <tr> <td>Regular Mode</td> <td>FP32 Precision / No TRT Acceleration</td> <td>FP32 Precision / 8 Threads</td> <td><code>paddle_static</code></td> </tr> <tr> <td>High-Performance Mode</td> <td>Optimal combination of prior precision type and acceleration strategy</td> <td>FP32 Precision / 8 Threads</td> <td>Select optimal prior backend (Paddle/OpenVINO/TRT, etc.)</td> </tr> </tbody> </table> </details>

<b>If you are more concerned with model accuracy, please choose a model with higher accuracy. If you are more concerned with inference speed, please choose a model with faster inference speed. If you are more concerned with model storage size, please choose a model with smaller storage size</b>.

2. Quick Start

Before using the seal text recognition pipeline locally, please ensure that you have completed the installation of the wheel package according to the installation tutorial. If you prefer to install dependencies selectively, please refer to the relevant instructions in the installation documentation. The corresponding dependency group for this pipeline is doc-parser. Once the installation is complete, you can experience it locally via the command line or integrate it with Python.

Please note: If you encounter issues such as the program becoming unresponsive, unexpected program termination, running out of memory resources, or extremely slow inference during execution, please try adjusting the configuration according to the documentation, such as disabling unnecessary features or using lighter-weight models.

2.1 Command Line Experience

You can quickly experience the seal_recognition pipeline effect with a single command:

bash

paddleocr seal_recognition -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False

# Use --device to specify the use of GPU for model inference.
paddleocr seal_recognition -i ./seal_text_det.png --device gpu

The examples above use the local paddle_static inference engine by default. To run them, first install PaddlePaddle by following PaddlePaddle Framework Installation.

If you choose transformers as the inference engine, make sure the Transformers environment is configured by following Inference Engine and Configuration, and then run the following command:

bash

# Use the transformers engine for inference
paddleocr seal_recognition -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --engine transformers

In most scenarios, the default paddle_static inference engine delivers better inference performance and is the recommended first choice.

<details><summary><b>The command line supports more parameter settings. Click to expand for detailed explanations of command line parameters.</b></summary> <table> <thead> <tr> <th>Parameter</th> <th>Description</th> <th>Parameter Type</th> <th>Default Value</th> </tr> </thead> <tbody> <tr> <td><code>input</code></td> <td><b>Meaning:</b>Data to be predicted, required.

<b>Description:</b> Local path of image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., network URL of image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path).

</td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>save_path</code></td> <td> <b>Meaning:</b>Specify the path to save the inference results file.

<b>Description:</b> If not set, the inference results will not be saved locally.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_orientation_classify_model_name</code></td> <td> <b>Meaning:</b>The name of the document orientation classification model.

<b>Description:</b> If not set, the default model in pipeline will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_orientation_classify_model_dir</code></td> <td><b>Meaning:</b>The directory path of the document orientation classification model.

<b>Description:</b> If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_unwarping_model_name</code></td> <td> <b>Meaning:</b>The name of the text image unwarping model.

<b>Description:</b> If not set, the default model in pipeline will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>doc_unwarping_model_dir</code></td> <td> <b>Meaning:</b>The directory path of the text image unwarping model.

<b>Description:</b> If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>layout_detection_model_name</code></td> <td> <b>Meaning:</b>The name of the layout detection model.

<b>Description:</b> If not set, the default model in pipeline will be used. </td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>layout_detection_model_dir</code></td> <td> <b>Meaning:</b>The directory path of the layout detection model.

<b>Description:</b> If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_detection_model_name</code></td> <td><b>Meaning:</b>The name of the seal text detection model.

<b>Description:</b> If not set, the pipeline's default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_detection_model_dir</code></td> <td><b>Meaning:</b>The directory path of the seal text detection model.

<b>Description:</b> If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the text recognition model.

<b>Description:</b> If not set, the default pipeline model is used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory path of the text recognition model.

<b>Description:</b> If not set, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size for the text recognition model.

<b>Description:</b> If not set, defaults to <code>1</code>.</td>

<td><code>int</code></td> <td></td> </tr> <tr> <td><code>use_doc_orientation_classify</code></td> <td><b>Meaning:</b>Whether to load and use document orientation classification module.

<b>Description:</b> If not set, defaults to pipeline initialization value (<code>True</code>).</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_doc_unwarping</code></td> <td><b>Meaning:</b>Whether to load and use text image correction module.

<b>Description:</b> If not set, defaults to pipeline initialization value (<code>True</code>).</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>use_layout_detection</code></td> <td> <b>Meaning:</b>Whether to load and use the layout detection module.

<b>Description:</b> If not set, the parameter will be set to the value initialized in the pipeline, which is <code>True</code> by default.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>layout_threshold</code></td> <td><b>Meaning:</b>Score threshold for the layout model.

<b>Description:</b> Any value between <code>0-1</code>. If not set, the default value is used, which is <code>0.5</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>layout_nms</code></td> <td><b>Meaning:</b>Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection.

<b>Description:</b> If not set, the parameter will be set to the value initialized in the pipeline, which is set to <code>True</code> by default.</td>

<td><code>bool</code></td> <td></td> </tr> <tr> <td><code>layout_unclip_ratio</code></td> <td><b>Meaning:</b>Unclip ratio for detected boxes in layout detection model.

<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>1.0</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>layout_merge_bboxes_mode</code></td> <td><b>Meaning:</b>The merging mode for the detection boxes output by the model in layout region detection.

<b>Description:</b>

<ul> <li><b>large</b>: When set to "large", only the largest outer bounding box will be retained for overlapping bounding boxes, and the inner overlapping boxes will be removed;</li> <li><b>small</b>: When set to "small", only the smallest inner bounding boxes will be retained for overlapping bounding boxes, and the outer overlapping boxes will be removed;</li> <li><b>union</b>: No filtering of bounding boxes will be performed, and both inner and outer boxes will be retained;</li> </ul>If not set, the default is <code>large</code>. </td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_det_limit_side_len</code></td> <td><b>Meaning:</b>Image side length limit for seal text detection.

<b>Description:</b> Any integer > <code>0</code>. If not set, the default is <code>736</code>.

</td> <td><code>int</code></td> <td></td> </tr> <tr> <td><code>seal_det_limit_type</code></td> <td><b>Meaning:</b>Limit type for image side in seal text detection.

<b>Description:</b> Supports <code>min</code> and <code>max</code>; <code>min</code> ensures shortest side ≥ <code>det_limit_side_len</code>, <code>max</code> ensures longest side ≤ <code>limit_side_len</code>. If not set, the default is <code>min</code>.

</td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_det_thresh</code></td> <td><b>Meaning:</b>Pixel threshold. Pixels with scores above this value in the probability map are considered text.

<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.2</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>seal_det_box_thresh</code></td> <td><b>Meaning:</b>Box threshold. Boxes with average pixel scores above this value are considered text regions.

<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.6</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>seal_det_unclip_ratio</code></td> <td><b>Meaning:</b>Expansion ratio for seal text detection. Higher value means larger expansion area.

<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.5</code>.

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>seal_rec_score_thresh</code></td> <td><b>Meaning:</b>Recognition score threshold. Text results above this value will be kept.

<b>Description:</b> Any float > <code>0</code>. If not set, the default is <code>0.0</code> (no threshold).

</td> <td><code>float</code></td> <td></td> </tr> <tr> <td><code>device</code></td> <td><b>Meaning:</b>The device used for inference.

<b>Description:</b> Support for specifying specific card numbers:

<ul> <li><b>CPU</b>: For example, <code>cpu</code> indicates using the CPU for inference.</li> <li><b>GPU</b>: For example, <code>gpu:0</code> indicates using the first GPU for inference.</li> <li><b>NPU</b>: For example, <code>npu:0</code> indicates using the first NPU for inference.</li> <li><b>XPU</b>: For example, <code>xpu:0</code> indicates using the first XPU for inference.</li> <li><b>MLU</b>: For example, <code>mlu:0</code> indicates using the first MLU for inference.</li> <li><b>DCU</b>: For example, <code>dcu:0</code> indicates using the first DCU for inference.</li> <li><b>MetaX GPU</b>: For example, <code>metax_gpu:0</code> indicates using the first MetaX GPU for inference.</li> <li><b>Iluvatar GPU</b>: For example, <code>iluvatar_gpu:0</code> indicates using the first Iluvatar GPU for inference.</li> </ul>If not set, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used. </td> <td><code>str</code></td> <td></td> </tr> <tr> <td><code>engine</code></td> <td><b>Meaning:</b> Inference engine. <b>Description:</b> Supports <code>None</code> (the default), <code>paddle</code>, <code>paddle_static</code>, <code>paddle_dynamic</code>, and <code>transformers</code>. When left as <code>None</code>, PaddleOCR preserves the behavior of earlier versions, which in most configurations is equivalent to <code>paddle</code>. For detailed descriptions, supported values, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>enable_hpi</code></td> <td><b>Meaning:</b> Whether to enable high-performance inference.</td> <td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_tensorrt</code></td> <td><b>Meaning:</b> Whether to enable the TensorRT subgraph engine of Paddle Inference.

<b>Description:</b> If the model does not support TensorRT acceleration, acceleration will not be used even if this flag is set.

For CUDA 11.8 versions of PaddlePaddle, the compatible TensorRT version is 8.x (x>=6). TensorRT 8.6.1.6 is recommended.

</td> <td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>precision</code></td> <td><b>Meaning:</b> Computation precision, such as <code>fp32</code> or <code>fp16</code>.</td> <td><code>str</code></td> <td><code>fp32</code></td> </tr> <tr> <td><code>enable_mkldnn</code></td> <td><b>Meaning:</b> Whether to enable MKL-DNN accelerated inference.

<b>Description:</b> If MKL-DNN is unavailable or the model does not support MKL-DNN acceleration, acceleration will not be used even if this flag is set.

</td> <td><code>bool</code></td> <td><code>True</code></td> </tr> <tr> <td><code>mkldnn_cache_capacity</code></td> <td> <b>Meaning:</b> MKL-DNN cache capacity. </td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>cpu_threads</code></td> <td><b>Meaning:</b> Number of threads used for inference on CPU.</td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>paddlex_config</code></td> <td><b>Meaning:</b> Path to the PaddleX pipeline configuration file.</td> <td><code>str</code></td> <td></td> </tr> </tbody> </table> </details>

After running, the results will be printed to the terminal, as follows:

bash

{'res': {'input_path': './seal_text_det.png', 'model_settings': {'use_doc_preprocessor': True, 'use_layout_detection': True}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 16, 'label': 'seal', 'score': 0.975529670715332, 'coordinate': [6.191284, 0.16680908, 634.39325, 628.85345]}]}, 'seal_res_list': [{'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': [array([[320,  38],
       ...,
       [315,  38]]), array([[461, 347],
       ...,
       [456, 346]]), array([[439, 445],
       ...,
       [434, 444]]), array([[158, 468],
       ...,
       [154, 466]])], 'text_det_params': {'limit_side_len': 736, 'limit_type': 'min', 'thresh': 0.2, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 0.5}, 'text_type': 'seal', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['天津君和缘商贸有限公司', '发票专用章', '吗繁物', '5263647368706'], 'rec_scores': array([0.99340463, ..., 0.9916274 ]), 'rec_polys': [array([[320,  38],
       ...,
       [315,  38]]), array([[461, 347],
       ...,
       [456, 346]]), array([[439, 445],
       ...,
       [434, 444]]), array([[158, 468],
       ...,
       [154, 466]])], 'rec_boxes': array([], dtype=float64)}]}}

The visualized results are saved under save_path, and the visualized result of seal OCR is as follows:

2.2 Python Script Integration

The above command line is for quickly experiencing and viewing the effect. Generally, in a project, you often need to integrate through code. You can complete the quick inference of the pipeline with just a few lines of code. The inference code is as follows:

python

from paddleocr import SealRecognition

pipeline = SealRecognition(
    use_doc_orientation_classify=False, # Set whether to use document orientation classification model
    use_doc_unwarping=False, # Set whether to use document image unwarping module
)
# ocr = SealRecognition(device="gpu") # Specify GPU for model inference
output = pipeline.predict("./seal_text_det.png")
for res in output:
    res.print() ## Print structured prediction results
    res.save_to_img("./output/")
    res.save_to_json("./output/")

The example above uses the local paddle_static inference engine by default. To run it, first install PaddlePaddle by following PaddlePaddle Framework Installation.

If you choose transformers as the inference engine, make sure the Transformers environment is configured by following Inference Engine and Configuration, and then run the following code:

python

from paddleocr import SealRecognition

pipeline = SealRecognition(
    engine="transformers",
)
# ocr = SealRecognition(device="gpu") # Specify GPU for model inference
output = pipeline.predict("./seal_text_det.png")
for res in output:
    res.print() ## Print structured prediction results
    res.save_to_img("./output/")
    res.save_to_json("./output/")

In most scenarios, the default paddle_static inference engine delivers better inference performance and is the recommended first choice.

In the above Python script, the following steps were executed:

(1) Instantiate a pipeline object for seal text recognition using the SealRecognition() class, with specific parameter descriptions as follows:

<table> <thead> <tr> <th>Parameter</th> <th>Description</th> <th>Type</th> <th>Default Value</th> </tr> </thead> <tbody> <tr> <td><code>doc_orientation_classify_model_name</code></td> <td><b>Meaning:</b>Name of the document orientation classification model.

<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_orientation_classify_model_dir</code></td> <td><b>Meaning:</b>Directory path of the document orientation classification model.

<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_unwarping_model_name</code></td> <td><b>Meaning:</b>Name of the document unwarping model.

<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>doc_unwarping_model_dir</code></td> <td><b>Meaning:</b>Directory path of the document unwarping model.

<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_detection_model_name</code></td> <td><b>Meaning:</b>Name of the layout detection model.

<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_detection_model_dir</code></td> <td><b>Meaning:</b>Directory path of the layout detection model.

<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_text_detection_model_name</code></td> <td><b>Meaning:</b>Name of the seal text detection model.

<b>Description:</b> If set to <code>None</code>, the default model will be used.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>seal_text_detection_model_dir</code></td> <td><b>Meaning:</b>Directory of the seal text detection model.

<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str</code></td> <td></td> </tr> <tr> <td><code>text_recognition_model_name</code></td> <td><b>Meaning:</b>Name of the text recognition model.

<b>Description:</b> If set to <code>None</code>, the pipeline default model is used.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_recognition_model_dir</code></td> <td><b>Meaning:</b>Directory path of the text recognition model.

<b>Description:</b> If set to <code>None</code>, the official model will be downloaded.</td>

<td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>text_recognition_batch_size</code></td> <td><b>Meaning:</b>Batch size for the text recognition model.

<b>Description:</b> If set to <code>None</code>, the default batch size is <code>1</code>.</td>

<td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_orientation_classify</code></td> <td><b>Meaning:</b>Whether to enable the document orientation classification module.

<b>Description:</b> If set to <code>None</code>, the default value is <code>True</code>.</td>

<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_unwarping</code></td> <td><b>Meaning:</b>Whether to enable the document image unwarping module.

<b>Description:</b> If set to <code>None</code>, the default value is <code>True</code>.</td>

<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_layout_detection</code></td> <td><b>Meaning:</b>Whether to load and use the layout detection module.

<b>Description:</b> If set to <code>None</code>, the parameter will be set to the value initialized in the pipeline, which is <code>True</code> by default.</td>

<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_threshold</code></td> <td><b>Meaning:</b>Score threshold for the layout model.

<b>Description:</b>

<ul> <li><b>float</b>: Any float between <code>0-1</code>;</li> <li><b>dict</b>: <code>{0:0.1}</code> where the key is the class ID and the value is the threshold for that class;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default of <code>0.5</code>.</li> </ul> </td> <td><code>float|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_nms</code></td> <td><b>Meaning:</b>Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection.

<b>Description:</b> If set to <code>None</code>, the parameter will be set to the value initialized in the pipeline, which is set to <code>True</code> by default.</td>

<td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_unclip_ratio</code></td> <td><b>Meaning:</b>Expansion ratio for the bounding boxes from the layout detection model.

<b>Description:</b>

<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>Tuple[float,float]</b>: Expansion ratios in horizontal and vertical directions;</li> <li><b>dict</b>: A dictionary with <b>int</b> keys representing <code>cls_id</code>, and <b>tuple</b> values, e.g., <code>{0: (1.1, 2.0)}</code> means width is expanded 1.1× and height 2.0× for class 0 boxes;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default of <code>1.0</code>.</li> </ul> </td> <td><code>float|Tuple[float,float]|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_merge_bboxes_mode</code></td> <td><b>Meaning:</b>Filtering method for overlapping boxes in layout detection.

<b>Description:</b>

<ul> <li><b>str</b>: Options include <code>large</code>, <code>small</code>, and <code>union</code> to retain the larger box, smaller box, or both;</li> <li><b>dict</b>: A dictionary with <b>int</b> keys representing <code>cls_id</code>, and <b>str</b> values, e.g., <code>{0: "large", 2: "small"}</code> means using different modes for different classes;</li> <li><b>None</b>: If set to <code>None</code>, uses the pipeline default value <code>large</code>.</li> </ul> </td> <td><code>str|dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_limit_side_len</code></td> <td><b>Meaning:</b>Image side length limit for seal text detection.

<b>Description:</b>

<ul> <li><b>int</b>: Any integer greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>736</code>.</li> </ul> </td> <td><code>int|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_limit_type</code></td> <td><b>Meaning:</b>Limit type for seal text detection image side length.

<b>Description:</b>

<ul> <li><b>str</b>: Supports <code>min</code> and <code>max</code>. <code>min</code> ensures the shortest side is no less than <code>det_limit_side_len</code>, while <code>max</code> ensures the longest side is no greater than <code>limit_side_len</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>min</code>.</li> </ul> </td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_thresh</code></td> <td><b>Meaning:</b>Pixel threshold for detection. Pixels with scores greater than this value in the probability map are considered text pixels.

<b>Description:</b>

<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>0.2</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_box_thresh</code></td> <td><b>Meaning:</b>Bounding box threshold. If the average score of all pixels inside a detection box exceeds this threshold, it is considered a text region.

<b>Description:</b>

<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>0.6</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_det_unclip_ratio</code></td> <td><b>Meaning:</b>Expansion ratio for seal text detection. The larger the value, the larger the expanded area.

<b>Description:</b>

<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>0.5</code>.</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>seal_rec_score_thresh</code></td> <td><b>Meaning:</b>Score threshold for seal text recognition. Text results with scores above this threshold will be retained.

<b>Description:</b>

<ul> <li><b>float</b>: Any float greater than <code>0</code>;</li> <li><b>None</b>: If set to <code>None</code>, the default value is <code>0.0</code> (no threshold).</li> </ul> </td> <td><code>float|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>device</code></td> <td><b>Meaning:</b>Device used for inference.

<b>Description:</b> Supports specifying device ID:

<ul> <li><b>CPU</b>: e.g., <code>cpu</code> means using CPU for inference;</li> <li><b>GPU</b>: e.g., <code>gpu:0</code> means using GPU 0;</li> <li><b>NPU</b>: e.g., <code>npu:0</code> means using NPU 0;</li> <li><b>XPU</b>: e.g., <code>xpu:0</code> means using XPU 0;</li> <li><b>MLU</b>: e.g., <code>mlu:0</code> means using MLU 0;</li> <li><b>DCU</b>: e.g., <code>dcu:0</code> means using DCU 0;</li> <li><b>MetaX GPU</b>: e.g., <code>metax_gpu:0</code> means using MetaX GPU 0;</li> <li><b>Iluvatar GPU</b>: e.g., <code>iluvatar_gpu:0</code> means using Iluvatar GPU 0;</li> <li><b>None</b>: If set to <code>None</code>, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.</li> </ul> </td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>engine</code></td> <td><b>Meaning:</b> Inference engine. <b>Description:</b> Supports <code>None</code> (the default), <code>paddle</code>, <code>paddle_static</code>, <code>paddle_dynamic</code>, and <code>transformers</code>. When left as <code>None</code>, PaddleOCR preserves the behavior of earlier versions, which in most configurations is equivalent to <code>paddle</code>. For detailed descriptions, supported values, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>engine_config</code></td> <td><b>Meaning:</b> Inference-engine configuration. <b>Description:</b> Recommended together with <code>engine</code>. For supported fields, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>dict|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>enable_hpi</code></td> <td><b>Meaning:</b> Whether to enable high-performance inference.</td> <td><code>bool</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_tensorrt</code></td> <td><b>Meaning:</b> Whether to enable the TensorRT subgraph engine of Paddle Inference.

<b>Description:</b> If the model does not support TensorRT acceleration, acceleration will not be used even if this flag is set.

For CUDA 11.8 versions of PaddlePaddle, the compatible TensorRT version is 8.x (x>=6). TensorRT 8.6.1.6 is recommended.

</td> <td><code>bool</code></td> <td><code>False</code></td> </tr> <tr> <td><code>precision</code></td> <td><b>Meaning:</b> Computation precision, such as <code>"fp32"</code> or <code>"fp16"</code>.</td> <td><code>str</code></td> <td><code>"fp32"</code></td> </tr> <tr> <td><code>enable_mkldnn</code></td> <td><b>Meaning:</b> Whether to enable MKL-DNN accelerated inference.

<b>Description:</b> If MKL-DNN is unavailable or the model does not support MKL-DNN acceleration, acceleration will not be used even if this flag is set.

</td> <td><code>bool</code></td> <td><code>True</code></td> </tr> <tr> <td><code>mkldnn_cache_capacity</code></td> <td> <b>Meaning:</b> MKL-DNN cache capacity. </td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>cpu_threads</code></td> <td><b>Meaning:</b> Number of threads used for inference on CPU.</td> <td><code>int</code></td> <td><code>10</code></td> </tr> <tr> <td><code>paddlex_config</code></td> <td><b>Meaning:</b> Path to the PaddleX pipeline configuration file.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> </tbody> </table>

(2) Call the <code>predict()</code> method of the Seal Text Recognition pipeline object for inference prediction. This method will return a <code>generator</code>. Below are the parameters and their descriptions for the <code>predict()</code> method:

<table> <thead> <tr> <th>Parameter</th> <th>Parameter Description</th> <th>Parameter Type</th> <th>Default Value</th> </tr> </thead> <tr> <td><code>input</code></td> <td><b>Meaning:</b>Input data to be predicted. Required.

<b>Description:</b> Supports multiple types:

<ul> <li><b>Python Var</b>: Image data represented by <code>numpy.ndarray</code>;</li> <li><b>str</b>: Local path of an image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., the network URL of an image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png">Example</a>; <b>Local directory</b>, containing images to be predicted, e.g., <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with an exact file path);</li> <li><b>list</b>: Elements of the list must be of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>, <code>[\"/root/data1\", \"/root/data2\"]</code>.</li> </ul> </td> <td><code>Python Var|str|list</code></td> <td></td> </tr> <tr> <td><code>use_doc_orientation_classify</code></td> <td><b>Meaning:</b>Whether to use the document orientation classification module during inference.</td> <td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_doc_unwarping</code></td> <td><b>Meaning:</b>Whether to use the text image correction module during inference.</td> <td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>use_layout_detection</code></td> <td> <b>Meaning:</b>Whether to use the layout detection module during inference. </td> <td><code>bool|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>layout_threshold</code></td> <td><b>Meaning:</b>Same meaning as the instantiation parameters.