docs/version3.x/module_usage/chart_parsing.en.md
Multimodal chart parsing is a cutting-edge OCR technology that focuses on automatically converting various types of visual charts (such as bar charts, line charts, pie charts, etc.) into structured data tables with formatted output. Traditional methods rely on complex pipeline designs with chart keypoint detection models, which involve many prior assumptions and tend to lack robustness. The models in this module leverage the latest VLM (Vision-Language Model) techniques and are data-driven, learning robust features from vast real-world datasets. Application scenarios include financial analysis, academic research, business reporting, and more—for instance, quickly extracting growth trend data from financial reports, experimental comparison figures from research papers, or user distribution statistics from market surveys—empowering users to transition from “viewing charts” to “using data”.
Note: The scores above are based on internal evaluation on a test set of 1801 samples, covering various chart types (bar, line, pie, etc.) across scenarios such as financial reports, regulations, and contracts. There is currently no plan for public release.
❗ Note: The PP-Chart2Table model was upgraded on June 27, 2025. To use the previous version, please download it here
❗ Before getting started, please install the PaddleOCR wheel package. Refer to the Installation Guide for details.
Run the following command to get started instantly:
paddleocr chart_parsing -i "{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png'}"
The example above uses the <code>paddle_dynamic</code> inference engine by default. To run it, first install PaddlePaddle by following PaddlePaddle Framework Installation.
If you choose transformers as the inference engine, make sure the Transformers environment is configured, and then run the following command:
# Use the transformers engine for inference
paddleocr chart_parsing -i "{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png'}" \
--engine transformers
In most scenarios, the default paddle_dynamic inference engine delivers better inference performance and is the recommended first choice.
Note: By default, PaddleOCR retrieves models from HuggingFace. If HuggingFace access is restricted in your environment, you can switch the model source to BOS by setting the environment variable: PADDLE_PDX_MODEL_SOURCE="BOS". Support for more mainstream sources is planned.
You can also integrate the inference of the vision-language model into your own project. Please download the example image locally before running the following code:
from paddleocr import ChartParsing
model = ChartParsing(model_name="PP-Chart2Table")
results = model.predict(
input={"image": "chart_parsing_02.png"},
batch_size=1
)
for res in results:
res.print()
res.save_to_json(f"./output/res.json")
The example above uses the <code>paddle_dynamic</code> inference engine by default. To run it, first install PaddlePaddle by following PaddlePaddle Framework Installation.
If you choose transformers as the inference engine, make sure the Transformers environment is configured, and then run the following code:
from paddleocr import ChartParsing
model = ChartParsing(
model_name="PP-Chart2Table",
engine="transformers",
)
results = model.predict(
input={"image": "chart_parsing_02.png"},
batch_size=1
)
for res in results:
res.print()
res.save_to_json(f"./output/res.json")
In most scenarios, the default paddle_dynamic inference engine delivers better inference performance and is the recommended first choice.
The output result will be:
{'res': {'image': 'chart_parsing_02.png', 'result': 'Year | Avg Revenue per 5-star Hotel (Million CNY) | Avg Profit per 5-star Hotel (Million CNY)\n2018 | 104.22 | 9.87\n2019 | 99.11 | 7.47\n2020 | 57.87 | -3.87\n2021 | 68.99 | -2.9\n2022 | 56.29 | -9.48\n2023 | 87.99 | 5.96'}}
Explanation of output parameters:
<ul> <li><code>image</code>: The path to the input image</li> <li><code>result</code>: The model's prediction output</li> </ul>The visualized result is:
Year | Avg Revenue per 5-star Hotel (Million CNY) | Avg Profit per 5-star Hotel (Million CNY)
2018 | 104.22 | 9.87
2019 | 99.11 | 7.47
2020 | 57.87 | -3.87
2021 | 68.99 | -2.9
2022 | 56.29 | -9.48
2023 | 87.99 | 5.96
Detailed explanation of related methods and parameters:
<b>Description:</b> If set to <code>None</code>, defaults to <code>PP-Chart2Table</code>.</td>
<td><code>str | None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>model_dir</code></td> <td><b>Meaning</b>Model storage path.</td> <td><code>str | None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>device</code></td> <td><b>Meaning:</b> Inference device.<b>Description:</b> <b>Examples:</b> <code>"cpu"</code>, <code>"gpu"</code>, <code>"npu"</code>, <code>"gpu:0"</code>
Defaults to GPU 0 if available; otherwise falls back to CPU.
</td> <td><code>str | None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>engine</code></td> <td><b>Meaning:</b> Inference engine. <b>Description:</b> Supports <code>None</code> (the default), <code>paddle</code>, <code>paddle_dynamic</code>, and <code>transformers</code>. When left as <code>None</code>, local inference uses the <code>paddle_dynamic</code> engine by default. For detailed descriptions, supported values, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>str|None</code></td> <td><code>None</code></td> </tr> <tr> <td><code>engine_config</code></td> <td><b>Meaning:</b> Inference-engine configuration. <b>Description:</b> Recommended together with <code>engine</code>. For supported fields, compatibility rules, and examples, see <a href="../inference_engine.en.md">Inference Engine and Configuration</a>.</td> <td><code>dict|None</code></td> <td><code>None</code></td> </tr> </tbody> </table><code>predict()</code> method parameters:
<table> <thead> <tr> <th>Parameter</th> <th>Description</th> <th>Type</th> <th>Default</th> </tr> </thead> <tr> <td><code>input</code></td> <td><b>Meaning:</b> Input data (required).<b>Description:</b> Input formats vary by model.
<ul> <li>For PP-Chart2Table: <code>{'image': image_path}</code></li> </ul> </td> <td><code>dict</code></td> <td>N/A</td> </tr> <tr> <td><code>batch_size</code></td> <td><b>Meaning:</b> Batch size.<b>Description:</b> Any positive integer.</td>
<td><code>int</code></td> <td>1</td> </tr> </table>Currently, this module supports inference only and does not yet support fine-tuning. Fine-tuning capabilities are planned for future releases.
For detailed descriptions, values, compatibility rules, and examples of the inference engine, please refer to <a href="../inference_engine.en.md">Inference Engine and Configuration Description</a>.
<strong>Test Environment Description:</strong>
<ul> <li><strong>Test Data:</strong> [Sample Image](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.jpg)</li> <li><strong>Hardware Configuration:</strong> <ul> <li>GPU: NVIDIA A100 40G</li> <li>CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz</li> </ul> </li> <li><strong>Software Environment:</strong> <ul> <li>Ubuntu 22.04 / CUDA 12.6 / cuDNN 9.5</li> <li>paddlepaddle-gpu 3.2.1 / paddleocr 3.5 / transformers 5.4.0 / torch 2.10</li> </ul> </li> </ul>