docs/hybrid/research/iobject-structure.md
IObject is imported from org.verapdf.wcag.algorithms.entities.IObject (external verapdf-wcag-algs library).
Based on sample response analysis, OpenDataLoader produces the following element types:
| Type | JSON type field | Description |
|---|---|---|
| Paragraph | paragraph | Text paragraph with font info |
| Heading | heading | Section heading with level |
| Table | table | Table with rows and cells |
| Image | image | Image/figure element |
| List | list | Bulleted or numbered list |
{
"type": "paragraph",
"id": 17,
"page number": 1,
"bounding box": [left, bottom, right, top] // PDF points, origin at bottom-left
}
{
"type": "paragraph",
"font": "ArialMT",
"font size": 8.0,
"text color": "[0.0, 0.0, 0.0, 0.7]",
"content": "Text content here"
}
{
"type": "heading",
"level": "1",
"content": "Heading text"
}
{
"type": "table",
"level": "1",
"number of rows": 3,
"number of columns": 3,
"rows": [
{
"type": "table row",
"row number": 1,
"cells": [
{
"type": "table cell",
"page number": 1,
"bounding box": [left, bottom, right, top],
"row number": 1,
"column number": 1,
"row span": 1,
"column span": 1,
"kids": [
{
"type": "paragraph",
"content": "Cell text"
}
]
}
]
}
]
}
[left, bottom, right, top] in PDF points, origin at BOTTOMLEFT{l, t, r, b} with coord_origin: "BOTTOMLEFT" or "TOPLEFT"bottom = page_height - docling_t, top = page_height - docling_b[l, b, r, t] → [left, bottom, right, top]From the codebase:
TableBorder - Table with border-based detectionTableBorderRow - Table rowTableBorderCell - Table cell with contents, rowSpan, colSpanBoundingBox - PDF coordinates (page, left, bottom, right, top)TextLineProcessor, TableBorderProcessor, HeadingProcessor, ListProcessor