docs/version3.x/inference_engine.en.md
PaddleOCR 3.5 introduces a unified inference-engine configuration mechanism: use engine to select the underlying inference engine, and use engine_config to pass engine-specific settings. This mechanism applies to both individual models and pipelines.
If engine is not explicitly specified, the default behavior remains the same as in earlier versions: except for a few scenarios such as high-performance inference and generative AI client request features, PaddleOCR uses the PaddlePaddle framework for inference in most cases. If engine is explicitly specified, initialization follows the selected engine first.
In PaddleOCR, an inference engine refers to the underlying runtime used to execute a model. It determines which runtime loads and runs the model. You can think of it as "the engine actually used during model inference." When using an inference engine, users usually only need to care about two things:
| Engine category | engine values | Description |
|---|---|---|
| PaddlePaddle framework | paddle, paddle_static, paddle_dynamic | Runs on the PaddlePaddle framework. |
| Transformers | transformers | Runs on Hugging Face Transformers. |
paddle: The unified entry point for the PaddlePaddle framework. It selects paddle_static or paddle_dynamic according to the model type and files in the model directory. If both are available, paddle_static is preferred.paddle_static: PaddlePaddle static-graph inference, suitable for scenarios that require better inference performance or more fine-grained performance tuning.paddle_dynamic: PaddlePaddle dynamic-graph inference, which is more flexible and easier to debug compared to static graph.transformers: Hugging Face Transformers inference, making it convenient to integrate with the Hugging Face ecosystem.When using the PaddlePaddle framework for inference, you need to install PaddlePaddle first. For installation instructions, see PaddlePaddle Framework Installation.
When using Transformers as the inference engine, you need to install Hugging Face Transformers. Example command:
python -m pip install "transformers>=5.4.0"
In many cases, you also need to install the underlying inference framework. For details, see the Transformers official documentation.
engine and engine_configengineengine is used to specify the inference engine. Supported values are:
| Value | Meaning | Description |
|---|---|---|
None | No explicit engine specified | Automatically determines the inference engine. Keep the behavior of PaddleOCR 3.4; in most cases, the PaddlePaddle framework will be used for inference. |
paddle | Unified PaddlePaddle framework entry | Automatically selects paddle_static or paddle_dynamic. |
paddle_static | Static-graph inference | Uses Paddle static-graph inference. |
paddle_dynamic | Dynamic-graph inference | Uses Paddle dynamic-graph inference. |
transformers | Transformers inference | Uses Hugging Face Transformers inference. |
engine_configengine_config is used to configure the inference engine and is recommended to be used together with engine. Common engine_config fields for each engine are listed below:
paddle_staticCommon fields include:
run_mode: execution mode, such as paddle, trt_fp32, trt_fp16, and mkldnn;device_type / device_id: device type and device index;cpu_threads: number of CPU inference threads;delete_pass: list of graph optimization passes to disable manually;enable_new_ir: whether to enable the new IR;enable_cinn: whether to enable CINN;trt_cfg_setting: low-level TensorRT configuration;trt_use_dynamic_shapes: whether to enable TensorRT dynamic shapes;trt_collect_shape_range_info: whether to collect shape range information;trt_discard_cached_shape_range_info: whether to discard existing shape range information and recollect it;trt_dynamic_shapes: dynamic shape configuration;trt_dynamic_shape_input_data: data used to fill input tensors when collecting dynamic shapes;trt_shape_range_info_path: path to the shape range information file;trt_allow_rebuild_at_runtime: whether rebuilding the TensorRT engine is allowed at runtime;mkldnn_cache_capacity: oneDNN (MKLDNN) cache capacity.paddle_dynamicCommon fields include:
device_type / device_id: device type and device index used during dynamic-graph execution.transformersCommon fields include:
dtype: data type used for model weights / inference, such as float16;device_type / device_id: inference device type and device index;trust_remote_code: whether to trust and execute custom code in model repositories;attn_implementation: attention implementation method, such as flash_attention_2;generation_config: generation parameters, such as max_new_tokens and temperature;model_kwargs: extra arguments passed to the model loading API;processor_kwargs: extra arguments passed to the processor / image processor loading API;tokenizer_kwargs: a compatibility-preserved field that is merged with processor_kwargs.engine_configAt the same level engine_config may be:
run_mode and cpu_threads).paddle_static, paddle_dynamic, transformers), each mapping to a nested dict. You must not mix bucket keys with flat keys at the same level (e.g. {"paddle_static": {...}, "run_mode": "paddle"} is invalid).When an engine is resolved, only the corresponding config is used: flat configs are validated as a whole; bucketed configs take the entry for that engine.
engine and engine_config passed through CLI arguments or Python API initialization arguments take precedence over fields with the same names in the pipeline configuration file.engine and engine_config act as global settings; engine and engine_config in submodules or sub-pipelines can override upper-level settings.engine is explicitly set, enable_hpi no longer takes effect.engine_config is explicitly provided, compatibility arguments for the selected engine are ignored. For example, in paddle / paddle_static scenarios, compatibility arguments such as use_tensorrt, precision, enable_mkldnn, mkldnn_cache_capacity, cpu_threads, and enable_cinn no longer take effect.--enginepaddleocr text_detection -i general_ocr_001.png --engine transformers
transformersfrom paddleocr import TextDetection
model = TextDetection(
model_name="PP-OCRv5_server_det",
engine="transformers",
)
result = model.predict("general_ocr_001.png")
paddle_static and engine_configfrom paddleocr import TextDetection
model = TextDetection(
model_name="PP-OCRv5_server_det",
engine="paddle_static",
engine_config={
"device_type": "cpu",
"cpu_threads": 4,
"run_mode": "mkldnn",
},
)
result = model.predict("general_ocr_001.png")
--enginepaddleocr ocr -i general_ocr_001.png --engine paddle_static
If you want to specify engine and engine_config for a specific module inside a pipeline, you can first export the configuration file, modify the corresponding module configuration, and then load it. For how to export, edit, and load the configuration file, see Using PaddleX Pipeline Configuration Files. Example:
First, export the pipeline configuration file:
from paddleocr import PaddleOCR
pipeline = PaddleOCR()
pipeline.export_paddlex_config_to_yaml("ocr_config.yaml")
Then, set engine and engine_config specifically for the TextDetection module in ocr_config.yaml:
pipeline_name: OCR
SubModules:
TextDetection:
engine: paddle_static
engine_config:
device_type: cpu
cpu_threads: 4
run_mode: mkldnn
Use the updated configuration file for inference:
from paddleocr import PaddleOCR
pipeline = PaddleOCR(paddlex_config="ocr_config.yaml")
result = pipeline.predict("general_ocr_001.png")