Back to Paddleocr

PaddleOCR Text Recognition Output Schema

skills/paddleocr-text-recognition/references/output_schema.md

3.5.03.0 KB
Original Source

PaddleOCR Text Recognition Output Schema

This document defines the output envelope returned by ocr_caller.py.

By default, ocr_caller.py saves the JSON envelope to a unique file under the system temp directory and prints the absolute saved path to stderr. Use --output when you need a custom destination, or --stdout when you want to skip file saving and print JSON directly.

Output Envelope

ocr_caller.py wraps provider response in a stable structure:

json
{
  "ok": true,
  "text": "Extracted text from all pages",
  "result": { ... },  // raw provider response
  "error": null
}

On error:

json
{
  "ok": false,
  "text": "",
  "result": null,
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable message"
  }
}

Error Codes

CodeDescription
INPUT_ERRORInvalid or unusable input (arguments, file source, format, types).
CONFIG_ERRORMissing or invalid API / client configuration.
API_ERRORRequest or response handling failed (network, HTTP, body parsing, schema).

Raw Result Notes

The result field contains raw provider output.
Raw fields may vary by model version and endpoint.

Raw Result Example

json
{
  "logId": "request-uuid",
  "errorCode": 0,
  "errorMsg": "Success",
  "result": {
    "ocrResults": [
      {
        "prunedResult": {
          "rec_texts": ["First line", "Second line"],
          "rec_scores": [0.98, 0.95],
          "...": "other OCR fields"
        },
        "ocrImage": "https://...",
        "inputImage": "https://...",
        "...": "other model-specific fields"
      }
    ],
    "dataInfo": {
      "numPages": 1,
      "type": "pdf",
      "...": "other metadata"
    },
    "...": "other top-level fields"
  }
}

Stable Fields for Downstream Use

Paths are relative to the output envelope root.

  • result.result.ocrResults[n].prunedResult
    Structured OCR data for page n.

  • result.result.ocrResults[n].prunedResult.rec_texts
    Recognized text lines for page n.

  • result.result.ocrResults[n].prunedResult.rec_scores
    Confidence scores for recognized text lines.

Text Extraction

ocr_caller.py extracts top-level text from result.result.ocrResults[n].prunedResult.rec_texts, joins lines with \n, and joins pages with \n\n.

Command Examples

bash
# OCR from URL (result auto-saves to the system temp directory)
uv run scripts/ocr_caller.py --file-url "URL" --pretty

# OCR local file (result auto-saves to the system temp directory)
uv run scripts/ocr_caller.py --file-path "doc.pdf" --pretty

# OCR with explicit file type
uv run scripts/ocr_caller.py --file-url "URL" --file-type 1 --pretty

# Save result to a custom file path
uv run scripts/ocr_caller.py --file-url "URL" --output "./result.json" --pretty

# Print JSON to stdout without saving a file
uv run scripts/ocr_caller.py --file-url "URL" --stdout --pretty