docs/version3.x/inference_deployment/serving/serving.en.md
Serving is a common deployment method in real-world production environments. By encapsulating inference capabilities as services, clients can access these services via network requests to obtain inference results. The client-side code can be written in different programming languages and does not need to match the server-side code. PaddleOCR recommends using PaddleX for serving. Please refer to Differences and Connections between PaddleOCR and PaddleX to understand the relationship between PaddleOCR and PaddleX.
PaddleX provides the following serving solutions:
It is recommended to first use the basic serving solution for quick validation, and then evaluate whether to try more complex solutions based on actual needs.
Run the following command to install the PaddleX serving plugin via PaddleX CLI:
paddlex --install serving
Run the server via PaddleX CLI:
paddlex --serve --pipeline {PaddleX pipeline registration name or pipeline configuration file path} [{other command-line options}]
Take the general OCR pipeline as an example:
paddlex --serve --pipeline OCR
You should see information similar to the following:
INFO: Started server process [63108]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
To adjust configurations (such as model path, batch size, deployment device, etc.), specify --pipeline as a custom configuration file. Refer to PaddleOCR and PaddleX for the mapping between PaddleOCR pipelines and PaddleX pipeline registration names, as well as how to obtain and modify PaddleX pipeline configuration files.
The command-line options related to serving are as follows:
<table> <thead> <tr> <th>Name</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><code>--pipeline</code></td> <td>PaddleX pipeline registration name or pipeline configuration file path.</td> </tr> <tr> <td><code>--device</code></td> <td>Deployment device for the pipeline. By default, a GPU will be used if available; otherwise, a CPU will be used."</td> </tr> <tr> <td><code>--host</code></td> <td>Hostname or IP address to which the server is bound. Defaults to <code>0.0.0.0</code>.</td> </tr> <tr> <td><code>--port</code></td> <td>Port number on which the server listens. Defaults to <code>8080</code>.</td> </tr> <tr> <td><code>--use_hpip</code></td> <td>If specified, uses high-performance inference. Refer to the High-Performance Inference documentation for more information.</td> </tr> <tr> <td><code>--hpi_config</code></td> <td>High-performance inference configuration. Refer to the High-Performance Inference documentation for more information.</td> </tr> </tbody> </table>The <b>"Development Integration/Deployment"</b> section in the PaddleOCR pipeline tutorial provides API references and multi-language invocation examples for the service.
Please refer to the PaddleX Serving Guide. More information about PaddleX pipeline configuration files can be found in Using PaddleX Pipeline Configuration Files.
It should be noted that, due to the lack of fine-grained optimization and other reasons, the current high-stability serving deployment solution provided by PaddleOCR may not match the performance of the 2.x version based on PaddleServing. However, this new solution fully supports the PaddlePaddle 3.0 framework. We will continue to optimize it and consider introducing more performant deployment solutions in the future.
By default, both basic serving and high-stability serving return images and other binary content in the response inline as Base64-encoded strings. When the response contains large images or a multi-page PDF, Base64 encoding can significantly inflate the payload; you can configure the service to return URLs instead. Enable it in the Serving section of the pipeline configuration file (return_urls is a top-level switch; object-storage settings live under Serving.extra) to return those fields as pre-signed URLs:
Serving:
return_urls: true
extra:
file_storage:
type: bos
endpoint: <BOS endpoint, e.g. https://bj.bcebos.com>
ak: xxx
sk: xxx
bucket_name: <bucket name>
url_expires_in: 3600 # Pre-signed URL lifetime in seconds; -1 means no expiry
paddlex --serve --pipeline.server/pipeline_config.yaml inside the SDK and restart the container.Currently, URL return is only supported by the bos (Baidu Intelligent Cloud object storage) backend. URL return is controlled by the top-level Serving.return_urls field, which applies to every Base64-inlined file field in the response (not just images). For the full configuration reference, notes, and use cases, see PaddleX Serving Guide - Returning Binary Content as URLs. See the Baidu Intelligent Cloud documentation for AK/SK retrieval.