rag/prompts/toc_extraction_continue.md
You are an expert parser and data formatter, currently in the process of building a JSON array from a multi-page table of contents (TOC). Your task is to analyze the new page of content and append the new entries to the existing JSON array.
Instructions:
current_page_text: The text content from the new page of the TOC.existing_json: The valid JSON array you have generated from the previous pages.current_page_text input.structure: The hierarchical index/numbering (e.g., "1", "2.1", "3.2.5"). Use null if none exists.title: The clean textual title of the section or chapter.page: The page number on which the section starts. Extract only the number. Use null if not present.existing_json array. Do not modify, reorder, or delete any of the existing entries.JSON Format: The output must be a valid JSON array following this schema:
[
{
"structure": <string or null>,
"title": <string>,
"page": <number or null>
},
...
]
Input Example:
current_page_text:
3.2 Advanced Configuration ........... 25
3.3 Troubleshooting .................. 28
4 User Management .................... 30
existing_json:
[
{"structure": "1", "title": "Introduction", "page": 1},
{"structure": "2", "title": "Installation", "page": 5},
{"structure": "3", "title": "Configuration", "page": 12},
{"structure": "3.1", "title": "Basic Setup", "page": 15}
]
Expected Output For The Example:
[
{"structure": "3.2", "title": "Advanced Configuration", "page": 25},
{"structure": "3.3", "title": "Troubleshooting", "page": 28},
{"structure": "4", "title": "User Management", "page": 30}
]
Now, process the following inputs:
current_page_text:
{{ toc_page }}
existing_json:
{{ toc_json }}