docs/update/update.en.md
Significant Model Additions:
Deployment Capability Upgrades:
Benchmark Support:
Bug Fixes:
use_chart_parsing) in the PP-StructureV3 configuration files compared to other pipelines.Other Enhancements:
Bug Fixes:
save_vector, save_visual_info_list, load_vector, and load_visual_info_list in the PP-ChatOCRv4 class.glossary and llm_request_interval to the translate method in the PPDocTranslation class.Documentation Improvements:
Others:
puremagic instead of python-magic to reduce installation issues.Key Models and Pipelines:
New MCP server: Details
Documentation Optimization: Improved the descriptions in some user guides for a smoother reading experience.
enable_mkldnn parameter was not effective, restoring the default behavior of using MKL-DNN for CPU inference.New Features:
BOS to HuggingFace. Users can also change the environment variable PADDLE_PDX_MODEL_SOURCE to BOS to set the model download source back to Baidu Object Storage (BOS).Bug Fixes:
export_paddlex_config_to_yaml would not function correctly in certain cases.save_path and its documentation description.overlap_ratio under extremely special circumstances in the PP-StructureV3 pipeline.Documentation Improvements:
enable_mkldnn parameter in the documentation to accurately reflect the program's actual behavior.lang and ocr_version parameters.Others:
Optimisation of certain models and model configurations:
limit_side_len in the configuration has been changed from 736 to 64.PP-LCNet_x1_0_textline_ori with an accuracy of 99.42%. The default text line orientation classifier for OCR, PP-StructureV3, and PP-ChatOCRv4 pipelines has been updated to this model.PP-LCNet_x0_25_textline_ori, improving accuracy by 3.3 percentage points to a current accuracy of 98.85%.Optimisation of issues present in version 3.0.0:
use_textline_orientation parameter.Fixes for issues present in version 3.0.0:
FatalError: Process abort signal is detected by the operating system during inference.PPStructureV3.concatenate_markdown_pages was missing.lang and model_name when instantiating paddleocr.PaddleOCR resulted in model_name being ineffective.PP-OCRv5: All-Scene Text Recognition Model
PP-StructureV3: General Document Parsing Solution
PP-ChatOCRv4: Intelligent Document Understanding Solution
Rebuilt Deployment Capabilities with Unified Inference Interface:
Optimized Training with PaddlePaddle Framework 3.0:
xxx.json instead of xxx.pdmodel.Unified Model Naming:
For more details, check out the Upgrade Notes from 2.x to 3.x.
12 new self-developed single models:
4 high-value multi-model combination solutions:
PaddleX, an All-in-One development tool based on PaddleOCR's advanced technology, supports low-code full-process development capabilities in the OCR field:
🎨 Rich Model One-Click Call: Integrates 17 models related to text image intelligent analysis, general OCR, general layout parsing, table recognition, formula recognition, and seal recognition into 6 pipelines, which can be quickly experienced through a simple Python API one-click call. In addition, the same set of APIs also supports a total of 200+ models in image classification, object detection, image segmentation, and time series forecasting, forming 20+ single-function modules, making it convenient for developers to use model combinations.
🚀 High Efficiency and Low barrier of entry: Provides two methods based on unified commands and GUI to achieve simple and efficient use, combination, and customization of models. Supports multiple deployment methods such as high-performance inference, service-oriented deployment, and on-device deployment. Additionally, for various mainstream hardware such as NVIDIA GPU, Kunlunxin XPU, Ascend NPU, Cambricon MLU, and Haiguang DCU, models can be developed with seamless switching.
Supports PP-ChatOCRv3-doc, high-precision layout detection model based on RT-DETR and high-efficiency layout area detection model based on PicoDet, high-precision table structure recognition model, text image unwarping model UVDoc, formula recognition model LatexOCR, and document image orientation classification model based on PP-LCNet.
English, Chinese, German, French, Japanese and Korean have been supported. Models for more languages will continue to be updatedattention model to inference_model