src/AI/docs/json-stream-chunker-design.md
We have an AI model that receives progressive JSON internally but outputs complete, valid JSON objects each time. The property order may vary between chunks. We need to convert these complete JSON objects back into streaming chunks that, when concatenated, produce valid JSON matching the final output.
Input: JSONL file where each line is a complete JSON object representing progressive construction
Output: Chunks that when concatenated produce valid JSON structurally equivalent to the final line
Line 1: {"days": [{"subtitle": "Day"}]}
Line 2: {"days": [{"subtitle": "Day 1: Arrival and Wildlife Safari", "activities": []}]}
Line 3: {"days": [{"subtitle": "Day 1: Arrival and Wildlife Safari", "activities": [{"title": "", "type": "Sightseeing"}]}]}
Line 4: {"days": [{"activities": [{"type": "Sightseeing", "description": "Embark", "title": "Morning Game Drive"}], "subtitle": "Day 1: Arrival and Wildlife Safari"}]}
...
Observations from real data:
"" appear and then grow ("title": "" → "title": "Morning Game Drive")namespace Microsoft.Maui.Essentials.AI;
/// <summary>
/// Converts complete JSON objects (from progressive AI output) back into streaming chunks.
/// </summary>
public class JsonStreamChunker
{
/// <summary>
/// Process one complete JSON object and return a streaming chunk.
/// May return empty string (data pending) or a chunk to emit.
/// </summary>
/// <param name="completeJson">A complete, valid JSON object representing the current state.</param>
/// <returns>A chunk to emit, or empty string if data is pending.</returns>
public string Process(string completeJson);
/// <summary>
/// Finalize processing and return any remaining output.
/// Call this after all input has been processed.
/// </summary>
/// <returns>Final chunk including closing brackets and any pending strings.</returns>
public string Flush();
}
// Consuming an async stream from an AI model
public async IAsyncEnumerable<string> ConvertToChunks(IAsyncEnumerable<string> completeJsonStream)
{
var chunker = new JsonStreamChunker();
await foreach (var completeJson in completeJsonStream)
{
var chunk = chunker.Process(completeJson);
if (!string.IsNullOrEmpty(chunk))
{
yield return chunk;
}
}
var finalChunk = chunker.Flush();
if (!string.IsNullOrEmpty(finalChunk))
{
yield return finalChunk;
}
}
// Result: Concatenating all yielded chunks produces valid JSON
// equivalent to the final input object
Analysis of all 4 test JSONL files revealed:
| Pattern | Frequency | Notes |
|---|---|---|
| String grows | ~40-50 per file | Most common change type |
| New string appears | ~39 per file | Often 1 at a time |
| 2 new growable items at once | 0-4 per file | Requires pending list |
| Multiple values change | 0 | Never happens - confirms assumption |
Empty array [] | Occasional | Gets populated in later chunk |
Empty object {} | Occasional | Gets populated in later chunk |
| Non-string primitives | 0 in test data | All values are strings in these examples |
Key insight: The "only one value changes per chunk" assumption holds in all test data (where values include strings, arrays, and objects).
If a new property appears at the same level as the open string, the open string is complete. The AI moved on horizontally.
Example:
Previous: {"name": "Mat"}
Current: {"name": "Matthew", "age": 30}
"age" is a NEW sibling at same level as "name"
→ "name" is COMPLETE (emit extension "thew", then close)
→ Then emit the new sibling
If new content appears at a higher level (e.g., new array item in parent), the current container and its open string are complete. The AI moved on vertically.
Example:
Previous: {"days": [{"title": "Day 1"}]}
Current: {"days": [{"title": "Day 1"}, {"title": "Day 2"}]}
days[1] appeared at parent level (days array)
→ days[0] and everything inside it is COMPLETE
→ Close days[0], then emit new array item
If the open string's value is unchanged from the previous chunk, it is complete.
Example:
Previous: {"name": "Matthew"}
Current: {"name": "Matthew", "age": 30}
"name" value unchanged
→ "name" is COMPLETE
(Note: Sibling rule would also apply here)
If 2+ new growable items appear at the same parent level in the same chunk, add them to pending and wait for the next chunk to see which one changes.
Growable types:
Numbers, bools, and null are NOT growable - they are always complete.
Example 1 - Multiple strings:
Previous: {"count": 5}
Current: {"count": 5, "a": "Hello", "b": "World"}
2 new strings appeared at root level (siblings) - which one will grow?
→ Add BOTH to pending
→ Emit nothing for strings yet
→ Wait for next chunk to see which changes
Example 2 - String and array at same level:
Previous: {"days": [{}]}
Current: {"days": [{"subtitle": "", "activities": []}]}
1 new string (subtitle) + 1 new array (activities) at days[0] level
Total: 2 growable items at same level - which one will grow?
→ Add subtitle (string) to _pendingStrings
→ Add activities (array) to _pendingContainers
→ Emit NOTHING for either yet
→ Wait for next chunk to see which changes:
- If subtitle value changes → subtitle is active, activities was complete
- If activities gets children → activities is active, subtitle was complete
- If both unchanged → both were complete
_prevState: Dictionary<string, JsonValue>?
- Flattened path→value dictionary from last chunk (null on first call)
- Used to detect what changed
- JsonValue is a record struct: (JsonValueKind Kind, string? StringValue, string? RawValue)
- Stores the Kind, StringValue (for strings), and RawValue (raw JSON for non-strings)
- IMPORTANT: Empty containers are stored as entries with JsonValueKind.Array or JsonValueKind.Object
(so we know they exist even with no children)
_openStringPath: string?
- Path of the currently open string (no closing quote emitted)
- At most ONE string can be open at a time
- null if no string is currently open
_pendingStrings: Dictionary<string, string>
- Map of path → value for strings we haven't emitted yet
- Populated when:
- 2+ new growable items (strings, arrays, objects) appear at the SAME parent level
- OR we already have an open value and encounter a new growable item
- Resolved at start of next chunk by comparing values
- Note: pending items may or may not be siblings (different nesting levels also go to pending)
_pendingContainers: Dictionary<string, bool>
- Map of path → isArray for containers we haven't emitted yet
- Populated when 2+ new growable items appear at the same parent level
- Resolved at start of next chunk by checking if container grew (got children)
- Detection: count paths starting with container path in prev vs curr
_emittedStrings: Dictionary<string, string>
- Map of path → emitted value for strings we HAVE emitted
- Used to calculate extension: extension = current[emitted.Length..]
_openStructures: Stack<(string path, bool isArray)>
- Stack of currently open containers
- Used to properly close structures when moving to different parts of tree
- IMPORTANT: When emitting at a different level, close structures down to target path
_emittedPaths: HashSet<string>
- Tracks which paths have been emitted
- Used to know when to emit commas (if sibling already emitted, prepend comma)
- IMPORTANT: Do NOT skip processing of existing array items just because path is in _emittedPaths
"prop":"value,"prop":"value{ or "value,{ or ,"valueCheck _emittedPaths to see if any sibling was already emitted at the same parent level.
If yes, prepend comma. If no, don't.
We flatten JSON into paths:
parent.childparent[0], parent[1]days[0].activities[1].titleExample:
{"days": [{"subtitle": "Day", "activities": [{"title": "Game"}]}]}
Flattens to:
days[0].subtitle = "Day"
days[0].activities[0].title = "Game"
Parent path calculation:
days[0].subtitle → parent is days[0]days[0].activities[0].title → parent is days[0].activities[0]days[0] → parent is daysFor each chunk:
1. FIRST CHUNK (special path - no previous state):
- Parse and flatten JSON
- Emit root structure opening "{"
- Process all containers depth-first via EmitStructure()
- For each container, count its direct GROWABLE children (strings, arrays, objects):
- If 0 growable values at this level: emit all non-growables (numbers, bools, null), continue to nested containers
- If 1 growable value at this level:
- If string: emit property open (no closing quote), set _openStringPath
- If container (array/object): emit opening bracket, push to _openStructures, recurse into children
- If 2+ growable values at this level:
- Add strings to _pendingStrings
- Add containers to _pendingContainers
- Emit NOTHING for these until next chunk resolves which is active
- Result: at most 1 open string, rest in pending or fully emitted
2. SUBSEQUENT CHUNKS (compare with previous state):
a. Step A - Handle open string (if _openStringPath is set):
- Check for new siblings at same level
- Check for new content at parent level
- If new sibling OR parent-level change:
→ Emit extension (if value changed)
→ Emit closing quote
→ Set _openStringPath = null
- Else if value changed:
→ Emit extension
→ Keep open (might still grow)
- Else (value same):
→ Emit closing quote
→ Set _openStringPath = null
b. Step B - Resolve pending (if _pendingStrings or _pendingContainers not empty):
- For strings: Categorize as COMPLETE (unchanged) vs CHANGED
- For containers: Categorize as COMPLETE (no new children) vs CHANGED (got children)
- Detection: count paths with prefix in prev vs curr state
- Emit all COMPLETE items first (sorted by path for consistency):
- Strings: emit with closing quote
- Containers: emit opening AND closing brackets AND all content (they're complete)
- Emit the CHANGED one as open (if any):
- String: becomes _openStringPath
- Container: emit opening bracket, push to _openStructures, recurse into children
- After setting active string, check for new siblings → if found, close immediately
c. Step C - Process new content:
- For objects: iterate properties, categorize as existing vs new
- Existing non-strings: recurse into them
- New properties: categorize by growable type
- For arrays: iterate items, compare index to previous count
- Existing items: recurse into non-strings
- New items: emit via EmitNewArrayItem
- For new growable items:
- Count growables at same parent level
- If 1 growable AND no open value: emit open, set as active
- If 1 growable AND have open value: add to pending
- If 2+ growable: add ALL to pending, emit nothing
3. FINALIZE (after last chunk):
- Emit any remaining pending items (all complete - sorted by path)
- Close _openStringPath if set (emit closing ")
- Close all open structures (emit } or ] for each, in reverse order from stack)
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP A: HANDLE OPEN STRING │
└─────────────────────────────────────────────────────────────────────────────┘
IF _openStringPath is set:
│
├─► Get current value at _openStringPath
│ Get emitted value from _emittedStrings[_openStringPath]
│
├─► Check for NEW siblings at same level
│ (properties in current that weren't in previous, at same parent path)
│
├─► Check for NEW content at PARENT level
│ (e.g., new array item in parent array)
│
├─► IF NEW SIBLING or PARENT-LEVEL CHANGE:
│ │
│ │ String is COMPLETE (AI moved on)
│ │
│ ├─► IF value changed:
│ │ extension = current[emitted.Length..]
│ │ Emit: extension + closing quote "
│ │
│ └─► ELSE:
│ Emit: closing quote "
│
│ Close any containers as needed:
│ Pop from _openStructures until we reach the level where new content will be emitted
│ Emit } or ] for each popped container
│ Set _openStringPath = null
│
└─► ELSE (no new siblings, no parent changes):
│
├─► IF value CHANGED:
│ extension = current[emitted.Length..]
│ Emit: extension (no closing quote)
│ Update _emittedStrings[path] = current
│ Keep _openStringPath set (might still grow)
│
└─► ELSE (value SAME):
Emit: closing quote "
Set _openStringPath = null
Real example - Sibling Rule (Line 2):
Previous: {"days": [{"subtitle": "Day"}]}
Current: {"days": [{"subtitle": "Day 1: Arrival and Wildlife Safari", "activities": []}]}
_openStringPath = "days[0].subtitle"
emitted = "Day"
current = "Day 1: Arrival and Wildlife Safari"
Check siblings at days[0]:
Previous had: subtitle
Current has: subtitle, activities
→ "activities" is NEW SIBLING!
Action:
extension = "Day 1: Arrival and Wildlife Safari"["Day".Length..] = " 1: Arrival and Wildlife Safari"
Emit: 1: Arrival and Wildlife Safari"
Set _openStringPath = null
Then emit new sibling: ,"activities":[
Output: 1: Arrival and Wildlife Safari","activities":[
Real example - Parent-Level Rule (Line 6):
Previous: {"days": [{"activities": [{"description": "Embark on a thrilling..."}]}]}
Current: {"days": [{"activities": [{"description": "...full text..."}, {"type": ""}]}]}
_openStringPath = "days[0].activities[0].description"
Parent of description = days[0].activities[0]
Parent's parent = days[0].activities (array)
Check: days[0].activities[1] is NEW
→ New content at parent level!
Action:
Emit extension for description
Close description: "
Close activities[0] object: }
Emit new array item: ,{
Handle strings in new item...
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP B: RESOLVE PENDING │
└─────────────────────────────────────────────────────────────────────────────┘
IF _pendingStrings OR _pendingContainers is not empty:
│
├─► RESOLVE PENDING STRINGS:
│ │
│ ├─► Categorize all pending strings:
│ │ For each (path, storedValue) in _pendingStrings:
│ │ currentValue = current[path]
│ │ IF currentValue == storedValue:
│ │ Add to COMPLETE_STRINGS list
│ │ ELSE:
│ │ Add to CHANGED_STRINGS list (should be exactly 0 or 1)
│ │
│ ├─► FIRST: Emit all COMPLETE strings (with closing quotes):
│ │ For each in COMPLETE_STRINGS:
│ │ needsComma = any sibling already in _emittedPaths
│ │ Emit: [,]"path":"value"
│ │ Add path to _emittedPaths
│ │
│ └─► Clear _pendingStrings
│
├─► RESOLVE PENDING CONTAINERS:
│ │
│ ├─► Categorize all pending containers:
│ │ For each (path, isArray) in _pendingContainers:
│ │ previousChildCount = count of previous paths starting with this container
│ │ currentChildCount = count of current paths starting with this container
│ │ IF currentChildCount == previousChildCount:
│ │ Add to COMPLETE_CONTAINERS list (container didn't grow)
│ │ ELSE:
│ │ Add to CHANGED_CONTAINERS list (container got children)
│ │
│ ├─► Emit all COMPLETE containers (with opening AND closing brackets):
│ │ For each in COMPLETE_CONTAINERS:
│ │ needsComma = any sibling already in _emittedPaths
│ │ IF isArray: Emit: [,]"path":[]
│ │ ELSE: Emit: [,]"path":{}
│ │ Add path to _emittedPaths
│ │
│ └─► Clear _pendingContainers
│
├─► EMIT THE CHANGED ITEM (at most 1 across strings and containers):
│ IF CHANGED_STRINGS has exactly 1:
│ needsComma = any sibling already in _emittedPaths
│ Emit: [,]"path":"value (no closing quote)
│ Set _openStringPath = path
│ Add path to _emittedPaths
│ Update _emittedStrings[path] = currentValue
│
│ IF CHANGED_CONTAINERS has exactly 1:
│ needsComma = any sibling already in _emittedPaths
│ IF isArray: Emit: [,]"path":[
│ ELSE: Emit: [,]"path":{
│ Push to _openStructures
│ Add path to _emittedPaths
│ → Recursively process children of this container
│
│ IF total CHANGED has 2+:
│ This should never happen per "only one value changes per chunk" invariant
│ Log warning and treat all as complete
│
└─► IF _openStringPath was just set:
Check for new siblings at same level (in current, not in previous)
IF new sibling exists:
→ Close immediately (Sibling Rule)
Emit: "
Set _openStringPath = null
Real example (Line 4):
_pendingStrings = {
"days[0].activities[0].title": "",
"days[0].activities[0].type": "Sightseeing"
}
Current values:
title = "Morning Game Drive" (was "")
type = "Sightseeing" (unchanged)
Categorize:
COMPLETE = [type]
CHANGED = [title]
Emit COMPLETE first:
type: no siblings emitted yet → no comma
Emit: "type":"Sightseeing"
Add to _emittedPaths
Emit CHANGED:
title: type already emitted (sibling) → needs comma
Emit: ,"title":"Morning Game Drive
Set _openStringPath = "days[0].activities[0].title"
Add to _emittedPaths
Check for new siblings of title:
"description" is NEW at same level!
→ Sibling Rule: close title immediately
Emit: "
→ Then handle description in Step C
Output: "type":"Sightseeing","title":"Morning Game Drive"
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP C: PROCESS NEW CONTENT │
└─────────────────────────────────────────────────────────────────────────────┘
For each new path in current that wasn't in previous:
│
├─► IF value is NON-GROWABLE (number, bool, null):
│ needsComma = any sibling already in _emittedPaths
│ Emit complete: [,]"path":value
│ Add path to _emittedPaths
│
└─► IF value is GROWABLE (string, object, array):
Group all new GROWABLE items BY PARENT (siblings only):
│
├─► For each parent with new growable items:
│ Count new growable items at this parent
│ │
│ ├─► IF 1 new growable item at this parent:
│ │ IF no open value (_openStringPath is null AND no pending):
│ │ │
│ │ ├─► IF string:
│ │ │ needsComma = any sibling in _emittedPaths
│ │ │ Emit open: [,]"path":"value (no closing quote)
│ │ │ Set _openStringPath = path
│ │ │ Add to _emittedPaths
│ │ │ Update _emittedStrings[path] = value
│ │ │
│ │ └─► IF object or array:
│ │ needsComma = any sibling in _emittedPaths
│ │ Emit opening: [,]"path":{ or [,]"path":[
│ │ Push to _openStructures
│ │ Add path to _emittedPaths
│ │ Recursively process children
│ │
│ │ ELSE (already have open value):
│ │ IF string: Add to _pendingStrings (wait for next chunk)
│ │ IF container: Add to _pendingContainers (wait for next chunk)
│ │
│ └─► IF 2+ new growable items at this parent:
│ For each growable item:
│ IF string: Add to _pendingStrings
│ IF container: Add to _pendingContainers
│ Do NOT emit values for these
│ (Will resolve in next chunk to see which one changes)
Note: We count new growable items per parent level, not globally. If we get:
{"a": {"x": "hello"}, "b": "world"}
a.x has 1 new growable item at parent ab has 1 new growable item at parent rootExample - 1 new string (Line 7):
Previous: days[0].activities[1] = {type: ""}
Current: days[0].activities[1] = {type: "FoodAndDining", title: "Lunch"}
After Step A closes type (sibling rule):
_openStringPath = null
_emittedPaths contains "days[0].activities[1].type"
Step C - New content:
"days[0].activities[1].title" is new
Parent = "days[0].activities[1]"
1 new string at this parent
"type" already in _emittedPaths (sibling) → needs comma
Action:
Emit: ,"title":"Lunch
Set _openStringPath = "days[0].activities[1].title"
Add to _emittedPaths
Output: ,"title":"Lunch
Example - 2+ new strings (Line 3):
Previous: days[0].activities = []
Current: days[0].activities = [{title: "", type: "Sightseeing"}]
Step C - New content:
days[0].activities[0] is new object
Parent of this object = days[0].activities
No siblings emitted → no comma for object
Emit: {
Push to _openStructures
Add "days[0].activities[0]" to _emittedPaths
Inside it: title and type (2 strings!)
Parent = days[0].activities[0]
2 new strings at this parent → pending
Add both to pending:
_pendingStrings = {
"days[0].activities[0].title": "",
"days[0].activities[0].type": "Sightseeing"
}
Do NOT emit string values
Output: {
{"days": [{"subtitle": "Day"}]}
First chunk processing:
days = arraydays[0] = objectdays[0].subtitle = "Day"{, push to _openStructuresdays: emit "days":[, push to _openStructuresdays[0]: emit {, push to _openStructuresdays[0]:
subtitle), no open string yet"subtitle":"Day (no closing quote)Output: {"days":[{"subtitle":"Day
↑ no closing quote
State:
_openStringPath = "days[0].subtitle"
_emittedStrings = {"days[0].subtitle": "Day"}
_emittedPaths = {"days", "days[0]", "days[0].subtitle"}
_openStructures = [root, days, days[0]]
{"days": [{"subtitle": "Day 1: Arrival and Wildlife Safari", "activities": []}]}
Step A - Handle open string:
activities is NEW!extension = " 1: Arrival and Wildlife Safari"
Emit: 1: Arrival and Wildlife Safari"
_openStringPath = null
Step C - New content:
days[0].activities is new (empty array)Output for Line 2: 1: Arrival and Wildlife Safari","activities":[
State:
_openStringPath = null
_openStructures = [root, days, days[0], days[0].activities]
{"days": [{"subtitle": "Day 1: Arrival and Wildlife Safari", "activities": [{"title": "", "type": "Sightseeing"}]}]}
Step A - No open string (was closed in Line 2)
Step C - New content:
days[0].activities[0] is new (object){Emit: {
_pendingStrings = {
"days[0].activities[0].title": "",
"days[0].activities[0].type": "Sightseeing"
}
Output for Line 3: {
State:
_openStructures = [..., days[0].activities[0]]
_emittedPaths += {"days[0].activities[0]"}
_pendingStrings = 2 entries
{"days": [{"activities": [{"type": "Sightseeing", "description": "Embark", "title": "Morning Game Drive"}], "subtitle": "Day 1: Arrival and Wildlife Safari"}]}
Step A - No open string
Step B - Resolve pending:
Emit for type: "type":"Sightseeing"
Emit for title: ,"title":"Morning Game Drive
_openStringPath = "days[0].activities[0].title"
Check for siblings of title:
description is NEW at same level!Emit: "
_openStringPath = null
Step C - New content:
description = "Embark" is newEmit: ,"description":"Embark
_openStringPath = "days[0].activities[0].description"
Output for Line 4: "type":"Sightseeing","title":"Morning Game Drive","description":"Embark
State:
_openStringPath = "days[0].activities[0].description"
_emittedStrings[...description] = "Embark"
{"days": [{"activities": [{"description": "Embark on a thrilling morning game drive to witness the Great Migration in all its glory.", "title": "Morning Game Drive", "type": "Sightseeing"}], "subtitle": "Day 1: Arrival and Wildlife Safari"}]}
Step A - Handle open string:
extension = " on a thrilling..."
Emit: on a thrilling...
_openStringPath still set
Output for Line 5: on a thrilling...
{"days": [{"subtitle": "Day 1: Arrival and Wildlife Safari", "activities": [{"description": "Embark on a thrilling morning game drive to witness the Great Migration in all its glory.", "type": "Sightseeing", "title": "Morning Game Drive"}, {"type": ""}]}]}
Step A - Handle open string:
days[0].activities[1] is NEW!extension = " morning game drive to witness the Great Migration in all its glory."
Emit extension: ...
Close description: "
Close activities[0] object: }
_openStringPath = null
Step C - New content:
days[0].activities[1] is new objectEmit: ,{
Emit: "type":"
_openStringPath = "days[0].activities[1].type"
_emittedStrings[...type] = ""
Output for Line 6: morning game drive to witness the Great Migration in all its glory."},{"type":"
{"days": [{"subtitle": "Day 1: Arrival and Wildlife Safari", "activities": [{"title": "Morning Game Drive", "description": "...", "type": "Sightseeing"}, {"type": "FoodAndDining", "title": "Lunch"}]}]}
Step A - Handle open string:
title is NEW!extension = "FoodAndDining"
Emit: FoodAndDining"
_openStringPath = null
Step C - New content:
title = "Lunch" is newEmit: ,"title":"Lunch
_openStringPath = "days[0].activities[1].title"
Output for Line 7: FoodAndDining","title":"Lunch
{"days": [{"activities": [{"type": "Sightseeing", "description": "...", "title": "Morning Game Drive"}, {"description": "Enjoy", "title": "Lunch at Restaurant 1", "type": "FoodAndDining"}], "subtitle": "Day 1: Arrival and Wildlife Safari"}]}
Step A - Handle open string:
description is NEW!extension = " at Restaurant 1"
Emit: at Restaurant 1"
_openStringPath = null
Step C - New content:
description = "Enjoy" is newEmit: ,"description":"Enjoy
_openStringPath = "days[0].activities[1].description"
Output for Line 8: at Restaurant 1","description":"Enjoy
Line 3: "title": ""
Line 4: "title": "Morning Game Drive"
"" like any other stringLine 2: "activities": []
Line 3: "activities": [{"title": ""}]
[] → emit [ only, push to _openStructures]Note on flattening: Empty containers ARE stored in the flattened state dictionary with their JsonValueKind (Array or Object). This allows us to:
Containers (arrays and objects) can have nested children that grow. When determining if a container is "changing", we must check ALL descendants, not just direct children.
The Rule: A container is "still active/growing" if ANY descendant path changes.
Example - Nested array with growing string:
Previous: {"items": [{"name": "Jo"}]}
Current: {"items": [{"name": "John"}]}
items is an array containing an objectname that grew ("Jo" → "John")items[0] exists in both, the string inside changeditems (the array) is still "active" - don't close itExample - Deep nesting:
Previous: {"root": {"level1": {"level2": {"value": "He"}}}}
Current: {"root": {"level1": {"level2": {"value": "Hello"}}}}
value is the actual string that changedlevel2, level1, and root are ALL still active because a descendant changedDetection Algorithm: When checking if a container at path P is "complete" vs "still active":
Why this matters for pending: When we have pending containers (e.g., a string and an array both appeared):
Previous: {"days": [{}]}
Current: {"days": [{"subtitle": "", "activities": []}]}
subtitle (string) and activities (array) are both pendingCurrent: {"days": [{"subtitle": "Day 1", "activities": []}]}
subtitle changed ("" → "Day 1") → subtitle is the active oneactivities has no new children → activities is completeCurrent: {"days": [{"subtitle": "", "activities": [{"type": ""}]}]}
subtitle unchanged ("" → "")activities now has children → activities is the active oneactivities itself didn't change - its DESCENDANTS didLine 5: activities has 1 item
Line 6: activities has 2 items
,{, process new itemdays[0].activities[2].details.notes[0]Previous: {"count": 5}
Current: {"count": 5, "a": {"x": "hello"}, "b": "world"}
a.x is at parent a (1 string at this level)b is at parent root (1 string at this level)a.x) becomes _openStringPathb) goes to pending even though it's alone at its levelLine N: Two strings added to pending
Line N+1: Both strings have same values (unchanged)
Two paths are at the same level (siblings) if they have the same parent:
days[0].title and days[0].subtitle → same parent days[0] ✓days[0].title and days[1].title → different parents ✗days[0].activities[0].title and days[0].activities[0].type → same parent ✓For open string at path P, check if any new content appeared at:
Example: If open string is at days[0].activities[0].description:
days[0].activities[0]days[0].activitiesdays[0].activities[1] appears → parent-level change![ or { only (no closing bracket){} and [] can get populated laterCloseStructuresDownTo to close structures when emitting at a different tree level{} becoming a nested object changes the keys even if the count stays the same.When we need to emit content at a different level in the JSON tree, we must first close any open structures that are not ancestors of the target path.
CloseStructuresDownTo(targetPath):
while _openStructures is not empty:
(topPath, isArray) = peek top of stack
// Check if targetPath is at or inside topPath
isPrefix = targetPath.startsWith(topPath) AND
(lengths equal OR next char is '.' or '[')
if topPath is root ("") OR isPrefix:
break // Don't close - we're inside this structure
pop from stack
emit ']' if isArray else '}'
Example:
_openStructures = [("", false), ("days", true), ("days[0]", false), ("days[0].activities", true)]
We want to emit at "days[1]" (new array item in days)
1. Check "days[0].activities" - "days[1]" doesn't start with this → close with ']'
2. Check "days[0]" - "days[1]" doesn't start with this → close with '}'
3. Check "days" - "days[1]" DOES start with "days" → stop
Result: emitted "]}" and stack is now [("", false), ("days", true)]
EmitNewProperty calls CloseStructuresDownTo(parentPath)EmitPendingString calls CloseStructuresDownTo(parentPath)EmitNewArrayItem calls CloseStructuresDownTo(arrayPath)This ensures we're always at the correct nesting level before emitting new content.
The implementation handles the first chunk differently from subsequent chunks:
EmitStructure to recursively process the JSON treeProcessNewContent to find and emit new properties/itemspath → value dictionariesWhen processing a chunk where a new property appears (signaling completion):
This prevents truncation of the final string extension.
A naive approach of "serialize with open strings, diff, emit diff" fails because:
Line 2: We serialize with subtitle open (no closing quote):
...Safari,"activities":[ (note: no quote after Safari)
Line 3: We serialize with type open (different string):
...Safari","activities":[{"title":"","type":"Sightseeing (quote after Safari)
When we diff these, they diverge at position 55 where one has , and the other has ".
The diff produces content that DUPLICATES properties!
The closing quote for a string is NOT at the end of our emitted output. It's embedded in the middle, before subsequent content. When we change which string is "open", the quote position moves, causing serializations to diverge unexpectedly.
Track what we've ACTUALLY EMITTED separately from any serialization:
src/AI/src/Essentials.AI/JsonStreamChunker.cssrc/AI/tests/Essentials.AI.UnitTests/JsonStreamChunkerTests.cssrc/AI/tests/Essentials.AI.UnitTests/TestData/ObjectStreams/*.jsonl