docs/solutions/performance-issues/2026-03-31-slate-applybatch-should-own-the-exact-path-set-node-fast-path.md
Slate needed exact-path set_node batching in the old setNodesBatch performance class, but the public surface had to stay:
editor.apply(op)Editor.withBatch(editor, fn)Transforms.applyBatch(editor, ops)The wrong answers were obvious:
editor.applyThe engine also needed to preserve two real semantics:
editor.apply wrappers still see each op in orderKeep batching behind editor.apply and move exact-path set_node speed into a private draft plus snapshot model:
Editor.withBatch(...) owns the lifecycle boundaryTransforms.applyBatch(...) just runs through that boundaryapplyOperationInBatch(...) stages exact-path set_node ops into draft stateeditor.children is accessor-backed through packages/slate/src/core/children.tseditor.apply wrappersset_node tree ops now write to a generic private draft root instead of committed childreninsert_text and remove_text ops still apply through applyTextBatchToChildren(...) instead of falling back to generic Transforms.transform(...)Transforms.mergeNodes(editor, { at }) now fast-paths direct previous-sibling merges, which trims normalize-heavy text observation without widening the public seammerge_node ops generated during live normalize no longer requeue live merge dirty paths after the parent normalizer already handled that paragraphThe exact-set-node tree rewrite lives in:
packages/slate/src/core/batching/exact-set-node-children.tsThe important architectural point is that batching stays an engine concern. Plugin authors still only deal with editor.apply(op).
Two details made the difference.
The first version of packages/slate/test/perf/set-nodes-bench.js wrapped editor.apply with a copied implementation of the old single-op phases. That completely broke the new one-seam architecture and made applyBatch(...) look much slower than it really was.
The fix was simple:
applyThe first draft implementation appended staged set_node ops like this:
BATCH_EXACT_SET_NODE_OPS.set(editor, [...ops, op]);
That is brutal at 5,000 ops because it turns staging into O(n²) array copying.
The fix is to mutate the private draft array in place:
const ops = BATCH_EXACT_SET_NODE_OPS.get(editor);
if (ops) {
ops.push(op);
} else {
BATCH_EXACT_SET_NODE_OPS.set(editor, [op]);
}
Because the array is internal draft state, mutating it is correct. The published immutable contract applies to nodes and snapshots, not to hidden batch bookkeeping.
Verified on packages/slate/test/perf/set-nodes-bench.js at 5,000 blocks:
| Lane | Flat | Grouped |
|---|---|---|
Transforms.setNodes(...) inside Editor.withoutNormalizing(...) | 63.26 ms | 36.35 ms |
editor.apply(set_node) loop inside Editor.withoutNormalizing(...) | 55.29 ms | 6.73 ms |
Transforms.applyBatch([...set_node]) | 4.17 ms | 4.38 ms |
Helper breakdown for the exact-path batch lane:
refs: 0.24 msdirtyPaths: 0.72 msstage: 0.14 msfinalize: 0.26 msmaterialize: 0.39 mscommit: 0.39 msrefs: 0.13 msdirtyPaths: 1.01 msstage: 0.03 msfinalize: 0.23 msmaterialize: 0.50 mscommit: 0.38 msThat puts the one-seam design back in the right performance class without bringing back a public setNodesBatch(...).
The newer mixed and generic lanes make the remaining gap obvious too:
| Lane | Flat / Empty |
|---|---|
Transforms.applyBatch([...set_node, insert_node]) on flat 5,000 blocks | 6.49 ms |
Transforms.applyBatch([...set_node, ...move_node]) on flat 5,000 blocks | 123.63 ms |
Transforms.applyBatch([...insert_node]) on an empty 5,000-node build-up | 394.97 ms |
Transforms.applyBatch([...prepend insert_node]) on an empty 5,000-node build-up | 380.33 ms |
editor.apply(insert_node) loop inside Editor.withoutNormalizing(...) on an empty 5,000-node build-up | 2551.51 ms |
editor.apply(prepend insert_node) loop inside Editor.withoutNormalizing(...) on an empty 5,000-node build-up | 2617.48 ms |
That made the next target clearer:
insert_node cut is worth keeping6x to 7x faster than replayset_node, but they are no longer the kind of replay-class cliff that forces another public seamThe mixed-batch planner also earned its keep:
Transforms.applyBatch([...set_node, ...move_node]) on flat 5,000 blocks now lands at 123.63 ms3129 ms1603.58 msThe widened structural lanes made the real priority order obvious too:
| Lane | Replay inside Editor.withoutNormalizing(...) | Transforms.applyBatch(...) |
|---|---|---|
split_node on flat 5,000 blocks | 9299.46 ms | 136.67 ms |
merge_node on flat 5,000 blocks | 2820.19 ms | 77.89 ms |
move_node on flat 5,000 blocks | 1692.25 ms | 126.46 ms |
remove_node on flat 5,000 blocks | 8.79 ms | 8.09 ms |
That changed the roadmap again:
move_node is no longer replay-classeditor.applysplit_node and merge_node stay expensive enough to revisit, but they are no longer the kind of fire that forces an immediate seam rewriteinsert_node on empty-document build-up is still the slowest specialized single-family benchmark lane, but not enough to justify another public APIset_node plus move_node is no longer the planner-era hotspot once carried dirty paths are remapped directlyremove_node still does not justify a dedicated optimizerThe next useful asymmetry was merge-specific:
split_node already had a private draft tree rewritemerge_node did notThe winning cut was to give distinct-parent text merges the same private draft shape as distinct-parent text splits:
2That matters because it improves the real work instead of more planner bookkeeping.
Verified on the same 5,000-block harness:
| Lane | Before | After |
|---|---|---|
Transforms.applyBatch([...merge_node]) on flat merged-text blocks | 65.8 ms | 12.74 ms |
manual Editor.withBatch([...merge_node]) on flat merged-text blocks | 66.16 ms | 13.25 ms |
Transforms.applyBatch([...set_node, ...merge_node]) on flat merged-text blocks | 80.06 ms | 24.75 ms |
manual Editor.withBatch([...set_node, ...merge_node]) on flat merged-text blocks | 78.03 ms | 23.15 ms |
Transforms.applyBatch([...move_node, ...merge_node]) on flat merged-text blocks | 103.68 ms | 44.78 ms |
manual Editor.withBatch([...move_node, ...merge_node]) on flat merged-text blocks | 103.51 ms | 44.56 ms |
Transforms.applyBatch([...set_node, ...move_node, ...merge_node]) on flat merged-text blocks | 98.02 ms | 48.51 ms |
manual Editor.withBatch([...set_node, ...move_node, ...merge_node]) on flat merged-text blocks | 101.82 ms | 46.1 ms |
That leaves merge-family batching where it should be:
One more semantic trap showed up while widening the wrapper-stack matrix:
merge_node in history after a split_nodebatchEditor.history === replayEditor.history, not a hand-written expected op list that ignores normalize workReact had one separate seam too:
withReact is not just withDOM + nicer exportsmove_node coverage needs to prove chunk-tree reconcile equivalence with replay, because withReact mutates movedNodeKeys before downstream applyeditor.children; it is reconcile output and reconcile callbacks matching replay for both Transforms.applyBatch(...) and manual Editor.withBatch(...)One more benchmarking trap showed up on the split lane:
5,000 blockscore/batch.ts, the answer got blunt:
Transforms.applyBatch([...split_node]): 40.82 msEditor.withBatch([...split_node]): 40.62 msrefs: 0.21 msstageDraft: 0.33 msstageDirtyPaths: 0.09 msfinalize: 0.51 msmaterialize: 1.84 msdirtyPathFlush: 2.6 msflushBeforeNormalize: 2.46 msnormalize: 24.91 msOne more benchmarking trap showed up after the merge cut:
applyBatch and manual withBatch text lanes are effectively in the same classapplyBatch look meaningfully slower on observed text batchesThe harness fix is small and worth keeping:
applyBatch vs withBatch gaps as meaningful unless they survive warmed runsVerified on warmed merged-text insert_text lanes at 5,000 blocks:
| Lane | Duration |
|---|---|
Transforms.applyBatch([...insert_text]) | 31.66 ms |
manual Editor.withBatch([...insert_text]) | 29.7 ms |
Transforms.applyBatch([...insert_text]) with read-after-each observation | 44.41 ms |
manual Editor.withBatch([...insert_text]) with read-after-each observation | 48.15 ms |
That is the right conclusion:
applyBatch(...) entrypoint anymoreThe warmed split lanes tell the same story:
| Lane | Duration |
|---|---|
Transforms.applyBatch([...split_node]) | 35.2 ms |
manual Editor.withBatch([...split_node]) | 40.71 ms |
Transforms.applyBatch([...set_node, ...split_node]) | 40.38 ms |
manual Editor.withBatch([...set_node, ...split_node]) | 38.79 ms |
Transforms.applyBatch([...move_node, ...split_node]) | 65.14 ms |
manual Editor.withBatch([...move_node, ...split_node]) | 61.87 ms |
Transforms.applyBatch([...set_node, ...move_node, ...split_node]) | 65.08 ms |
manual Editor.withBatch([...set_node, ...move_node, ...split_node]) | 69.01 ms |
Same conclusion:
applyBatch(...) is not the problem anymorewithBatch(...) is not hiding a secret faster engineThat should change how future work gets judged:
One more ugly bottleneck showed up after that: same-parent insert_node batching still looked too slow on empty-document build-up even though the private insert draft itself was cheap.
The actual cause was dumb:
canStageInsertNodeOperation(...) cloned the full staged insert op list on every insertcanApplyInsertNodeBatchToChildren(...) rescanned that growing list just to confirm the new insert still targeted the same parent pathThat meant append and prepend-heavy insert batches were paying accidental O(n²) validation cost before the real draft logic even had a chance to help.
The fix was to store the staged insert parent path directly in batch state and compare the next op against that path:
Verified on the same 5,000-block harness:
| Lane | Before | After |
|---|---|---|
Transforms.applyBatch([...insert_node]) on an empty document | 394.97 ms | 13.22 ms |
Transforms.applyBatch([...prepend insert_node]) on an empty document | 380.33 ms | 5.5 ms |
editor.apply(insert_node) loop inside Editor.withoutNormalizing(...) on an empty document | 2551.51 ms | 2468.22 ms |
editor.apply(prepend insert_node) loop inside Editor.withoutNormalizing(...) on an empty document | 2617.48 ms | 2603.8 ms |
That is the clean result you want:
Observation-heavy lanes turned out to matter enough to measure directly:
| Lane | Duration |
|---|---|
Transforms.applyBatch([...set_node]) on flat 5,000 blocks | 9.1 ms |
manual Editor.withBatch(...) exact set_node loop on flat 5,000 blocks | 5.95 ms |
Transforms.applyBatch([...set_node]) with read-after-each observation on flat 5,000 blocks | 32.1 ms |
manual Editor.withBatch(...) exact set_node loop with read-after-each observation on flat 5,000 blocks | 19.05 ms |
Transforms.applyBatch([...set_node, ...move_node]) on flat 5,000 blocks | 166.53 ms |
manual Editor.withBatch(...) exact set_node + move_node loop on flat 5,000 blocks | 179.29 ms |
Transforms.applyBatch([...set_node, ...move_node]) with read-after-each observation on flat 5,000 blocks | 187.94 ms |
manual Editor.withBatch(...) exact set_node + move_node loop with read-after-each observation on flat 5,000 blocks | 178.92 ms |
Transforms.applyBatch([...interleaved insert_node, move_node]) on empty 5,000-node build-up | 79.26 ms |
manual Editor.withBatch(...) interleaved insert_node + move_node loop on empty 5,000-node build-up | 79.08 ms |
Transforms.applyBatch([...set_node]) with duplicate target paths on flat 5,000 blocks | 8.54 ms |
manual Editor.withBatch(...) duplicate exact set_node loop on flat 5,000 blocks | 7.12 ms |
Transforms.applyBatch([...set_node]) with wrapper read-after-each observation on flat 5,000 blocks | 16.97 ms |
manual Editor.withBatch(...) exact set_node loop with wrapper read-after-each observation on flat 5,000 blocks | 20.96 ms |
Transforms.applyBatch([...insert_text]) on merged-text flat 5,000 blocks | 85.08 ms |
Transforms.applyBatch([...insert_text]) with read-after-each observation on merged-text flat 5,000 blocks | 82.06 ms |
That split is useful, not weird:
Transforms.applyBatch(...) and manual Editor.withBatch(...) stay in the same performance class, so the public sugar is not hiding a second-class executorset_node no longer falls off a cliff under read-after-each observation on the hot root/block-only lanesset_node + move_node lane stays in the same class for both entrypoints once live move dirty-path batching is in place, so Editor.withBatch(...) is no longer hiding a replay-class cliff thereinsert_node + move_node lane stays in the same class for both entrypoints once the insert prefix can seed the same live dirty-path batch, so Editor.withBatch(...) is no longer hiding a replay-class cliff there eitherinsert_text hot lane is fixed, and the read-after-each observed lane is back in the same performance class instead of exploding under wrapper readsThe final observation-heavy move win came from two narrow fixes, not a new abstraction:
set_node / insert_node / move_node / remove_node workmove_node flushes now remap child indexes directly instead of rebuilding a full sibling-order array on every readAnother planner-specific cliff showed up once the obvious hot lanes were fixed:
set_node prefixes followed by independent-parent merge_node or split_node segmentsThat is needless work for independent-parent structural batches. Each carried dirty path can only be affected by:
The winning cut was to stop treating those segments like generic path-transform loops:
One more split-specific tax was still hanging around after that:
editor.getDirtyPaths(op) semanticssplitNodeThenSetSelectioneditor.getDirtyPaths(op)Verified on the flat 5,000-block harness with repeats=5:
| Lane | Before | After |
|---|---|---|
Transforms.applyBatch([...split_node]) | 110.96 ms | 91.12 ms |
manual Editor.withBatch([...split_node]) | 176.32 ms | 100 ms |
Transforms.applyBatch([...set_node, ...split_node]) | 129.82 ms | 109.93 ms |
manual Editor.withBatch([...set_node, ...split_node]) | 125.07 ms | 98.43 ms |
The same principle applied one step later to merge:
merge_node ops do not need dirty-path staging just because they happen outside observed normalizeVerified on the flat merged-text 5,000-block harness with repeats=5:
| Lane | Before | After |
|---|---|---|
Transforms.applyBatch([...merge_node]) | 82.83 ms | 76.19 ms |
manual Editor.withBatch([...merge_node]) | 79.17 ms | 64.78 ms |
Transforms.applyBatch([...insert_text]) with read-after-each observation | 101.82 ms | 84.41 ms |
One more large gap showed up only on triple mixed lanes:
Transforms.applyBatch([...set_node, ...move_node, ...merge_node])Transforms.applyBatch([...set_node, ...move_node, ...split_node])The problem was not the live engine. It was planner overhead:
Editor.withBatch(...) was already faster on the whole batchThe right fix was not a new planner trick. It was a deopt:
generic, move, or same-parent-moveVerified on the 5,000-block harness with repeats=5:
| Lane | Before | After |
|---|---|---|
Transforms.applyBatch([...set_node, ...move_node, ...merge_node]) | 338.31 ms | 99.75 ms |
manual Editor.withBatch([...set_node, ...move_node, ...merge_node]) | 203.87 ms | 89.69 ms |
Transforms.applyBatch([...set_node, ...move_node, ...split_node]) | 199.49 ms | 117.62 ms |
manual Editor.withBatch([...set_node, ...move_node, ...split_node]) | 140.68 ms | 110.28 ms |
That cut worked, but it also exposed something uglier:
split_node and merge_node batching was strong enough, the planner-owned independent-parent segment path had become dead weightHard-cutting that planner path was the right move. Verified on the same 5,000-block harness:
| Lane | Transforms.applyBatch(...) | manual Editor.withBatch(...) |
|---|---|---|
merge_node | 70.4 ms | 70.68 ms |
split_node | 145.36 ms | 158.33 ms |
set_node + merge_node | 83.92 ms | 73.35 ms |
set_node + split_node | 165.49 ms | 141.54 ms |
That is the cleaner architecture lesson:
Transforms.applyBatch(...) should stay a thin planner over the same batch engine, not a second implementation playgroundThe next broad structural win came from the base move_node transform itself, not from another batch-only seam:
modifyChildren(...) pass, then reinserted it with anothersplice(...) callsThat cut landed in packages/slate/src/interfaces/transforms/general.ts, and the numbers moved hard:
| Lane | Duration |
|---|---|
Transforms.applyBatch([...move_node]) on flat 5,000 blocks | 28.84 ms |
Transforms.applyBatch([...set_node, ...move_node]) on flat 5,000 blocks | 48.28 ms |
Transforms.applyBatch([...move_node, ...merge_node]) on flat 5,000 blocks | 83.3 ms |
manual Editor.withBatch([...move_node, ...merge_node]) on flat 5,000 blocks | 86.21 ms |
Transforms.applyBatch([...move_node, ...split_node]) on flat 5,000 blocks | 174.9 ms |
manual Editor.withBatch([...move_node, ...split_node]) on flat 5,000 blocks | 152.66 ms |
Transforms.applyBatch([...interleaved insert_node, move_node]) on empty 5,000-node build-up | 53.85 ms |
manual Editor.withBatch([...interleaved insert_node, move_node]) on empty 5,000-node build-up | 70.68 ms |
That changed the roadmap again:
The first real split-family cut was not another split-specific planner trick. It was smaller and better:
Transforms.applyBatch(...) was still paying segment-planning overhead on batches that contained no insert or move families at allset_node + split_node, the planner could not specialize anything, so the work was pure taxEditor.withBatch(...) drive the live engine directlyVerified on the same 5,000-block harness:
| Lane | Transforms.applyBatch(...) | manual Editor.withBatch(...) |
|---|---|---|
set_node + split_node | 143.28 ms | 145.67 ms |
move_node + split_node | 172.75 ms | 176.65 ms |
split_node | 126.54 ms | 120 ms |
That is the maintainable lesson:
split_node is a real counterexample because later split paths only exist after earlier transformssplit_node + set_selection is another one because the draft must normalize against the live transformed tree, not a simulated dirty-path replayOne more split cut was worth keeping too, but for a different reason:
That did not produce some dramatic benchmark spike. It produced something better:
Transforms.applyBatch(...) and manual Editor.withBatch(...)That keeps the public seam boring and collapses the mixed planner cliff:
| Lane | Before | After |
|---|---|---|
Transforms.applyBatch([...set_node, ...merge_node]) on flat 5,000 blocks | 1083.61 ms | 186.9 ms |
Transforms.applyBatch([...set_node, ...split_node]) on flat 5,000 blocks | 1284.46 ms | 257.59 ms |
The next split-specific win came from staging live split_node dirty paths for manual Editor.withBatch(...) loops instead of letting them fall back to the generic tail:
| Lane | Result |
|---|---|
Transforms.applyBatch([...split_node]) on flat 5,000 blocks | 184.3 ms |
Editor.withBatch([...split_node]) on flat 5,000 blocks | 171.12 ms |
Transforms.applyBatch([...set_node, ...split_node]) on flat 5,000 blocks | 295.95 ms |
Editor.withBatch([...set_node, ...split_node]) on flat 5,000 blocks | 150.81 ms |
One subtle semantic bug mattered there:
[0] has parent path [][1] never remaps to [2]That is not optional bookkeeping. It is the difference between manual batched split_node staying in the right performance class and silently diverging from replay semantics on mixed root-level batches.
The merge-side equivalent was lower drama:
merge_node batches are independent-parent workeditor.getDirtyPaths(op) for every merge during live flushBuilding dirty-path state directly for independent-parent merge ops and reusing the same parent-prefix transform keeps the flat merge lane in the same class as manual batching:
| Lane | Transforms.applyBatch(...) | manual Editor.withBatch(...) |
|---|---|---|
merge_node | 71.37 ms | 67.62 ms |
split_node | 129.1 ms | 131.07 ms |
set_node + split_node | 139.11 ms | 128.67 ms |
That also exposed a useful cleanup rule:
split_node and live merge_node batching were good enough, the old planner-owned independent-parent split / merge fast path stopped earning its keepTransforms.applyBatch(...) through the same live engine that manual Editor.withBatch(...) already usedCurrent 5,000-block checkpoint after that cut:
| Lane | Result |
|---|---|
Transforms.applyBatch([...merge_node]) | 70.4 ms |
Editor.withBatch([...merge_node]) | 70.68 ms |
Transforms.applyBatch([...split_node]) | 145.36 ms |
Editor.withBatch([...split_node]) | 158.33 ms |
Transforms.applyBatch([...set_node, ...merge_node]) | 83.92 ms |
Editor.withBatch([...set_node, ...merge_node]) | 73.35 ms |
Transforms.applyBatch([...set_node, ...split_node]) | 165.49 ms |
Editor.withBatch([...set_node, ...split_node]) | 141.54 ms |
The split-family cut that actually survived after that was narrower than a new split engine:
Current 5,000-block checkpoint after that cut:
| Lane | Result |
|---|---|
Transforms.applyBatch([...split_node]) | 114.68 ms |
Editor.withBatch([...split_node]) | 108.06 ms |
Transforms.applyBatch([...set_node, ...split_node]) | 126.19 ms |
Editor.withBatch([...set_node, ...split_node]) | 112.99 ms |
Transforms.applyBatch([...move_node, ...split_node]) | 158.83 ms |
Editor.withBatch([...move_node, ...split_node]) | 140.92 ms |
That is the important lesson:
The next useful checkpoint was not another optimization. It was a composability check:
| Lane | Result |
|---|---|
Transforms.applyBatch([...set_node, ...move_node, ...merge_node]) | 120 ms |
Editor.withBatch([...set_node, ...move_node, ...merge_node]) | 117.53 ms |
Transforms.applyBatch([...set_node, ...move_node, ...split_node]) | 153.22 ms |
Editor.withBatch([...set_node, ...move_node, ...split_node]) | 149.09 ms |
That matters because it says something simple:
One semantic wrinkle matters here too:
set_node + merge_node on merged-text paragraphs is not a plain replay-oracle scenarioeditor.apply(set_node) eagerly normalizes and can collapse adjacent text siblings before the later merge_nodeTransforms.applyBatch(...) equivalence with manual Editor.withBatch(...), not equivalence with plain per-op replayThe last text-op gap was not the leaf rewrite anymore. It was the normalize tail:
insert_text and remove_text ops need their own private drafteditor.selection, or mixed text-selection batches freeze the cursor at stale offsetsOne smaller cut was still worth taking after that:
insert_text and remove_text were still reporting Path.levels(path) as dirtyPath.parent(path) cut that dead normalize work without changing replay semanticsTargeted direct instrumentation on the merged-text read-after-each lane showed the result:
| Lane | Before | After |
|---|---|---|
Transforms.applyBatch([...insert_text]) with read-after-each observation on merged-text flat 5,000 blocks | 3234.12 ms total / 3169.85 ms in editor.normalize | 2510.37 ms total / 2387.83 ms in editor.normalize |
That is not the final answer for observation-heavy text workloads. It is just the honest next cut:
The next two small cuts were to stop throwing away the text fast path and to stop paying for fake second passes:
Transforms.transform(...), the lane bleeds time for no semantic reasoninsert_text and remove_text ops through applyTextBatchToChildren(...) on that generic draft root keeps the same public behavior and avoids the generic tree-transform taxmerge_node normalization also needed its own cheap path, because every observed text edit on a merged-text paragraph can record one of those mergesinsert_text still paid a second observed normalize pass with dirty:3merge_node generated during the first passVerified on the same merged-text 5,000-block harness:
| Lane | Duration |
|---|---|
Transforms.applyBatch([...insert_text]) on merged-text flat 5,000 blocks | 85.08 ms |
Transforms.applyBatch([...insert_text]) with read-after-each observation on merged-text flat 5,000 blocks | 82.06 ms |
Two follow-up findings are worth keeping:
repeats=3 can tell ghost stories; use at least repeats=5 before treating merged-text microbench numbers as signalOne more observation-barrier bug surfaced after the matrix got wide enough:
Transforms.setNodes(...) wraps its work in Editor.withoutNormalizing(...)Transforms.setNodes(...) inside the same outer batch reads editor.children while queued batch normalize debt still existsEditor.isNormalizing(editor) was false, so Editor.normalize(...) returned immediately and the barrier kept looping on the same dirty paths foreverThe fix is simple and correct:
editor.children read happens while normalization is currently suspended, return the current draft as-iswithoutNormalizing(...) boundary flush it laterThat keeps replay semantics for explicit observed reads, but it stops transform-internal reads from eating their own queued normalize debt when the caller already asked Slate not to normalize yet.
That was the real replay drift behind the failing mixed and DOM matrices:
insert_text -> move_node pair could read cleanly after the text op, then still merge an unrelated paragraph on the later move because the observation barrier had not finished the normalize workThe fix kept the same one-seam design:
editor.children seameditor.applyThe next real mixed-op hotspot was interleaved same-parent insert_node + move_node on an empty document.
The first planner pass still left it ugly:
| Lane | Before targeted mixed-op planner cut |
|---|---|
Transforms.applyBatch([...interleaved insert_node, move_node]) on an empty document | 6063.77 ms |
Measuring the generic path showed the same old villain again:
The fix was not a second draft overlay. That would have been cute and wrong.
One testing wrinkle turned out to matter too:
insert_text or remove_text on a paragraph with adjacent text leaves, Slate may normalize those leaves immediately by recording an internal merge_nodeeditor.operations ordering is still worth asserting for the stable families, but not for every pair containing those text-first normalization casesError-path coverage mattered too:
set_node staging must validate root-path and forbidden-property errors before mutating batch draft stateeditor.children read materializes the draft and explodesDirect children assignment needed a stronger rule too.
editor.children = ... inside Editor.withBatch(...) is a hard reset, not “just another write”editor.operations must be droppedThat also forced one architectural cleanup:
editor.children writes cannot trigger the hard resetwithInternalBatchWrites(...)Without that distinction, normal structural transforms started clearing their own queued normalize and pending ops, which is obviously garbage.
onChange still flushes once after the thrown errorThe fix was:
insert_node + move_node runs in the plannereditor.applyThat simulation tracks:
Verified on the same 5,000-block harness:
| Lane | Replay / generic loop | Optimized Transforms.applyBatch(...) |
|---|---|---|
interleaved same-parent insert_node + move_node on empty document | 8898.41 ms | 77.33 ms |
That is the right kind of win:
editor.apply wrappersOne extra lesson from the failed follow-up slice:
split_node overlay was not good enoughOne more lesson from the observation bug:
editor.apply, wrapper-visible normalize ops diverge from replay tooOne more lesson from the move follow-up:
move_node batches is semantically clean, but it was only a small trimPath.levels(parent) plus the final slots of the nodes that actually moved5,000-block move lane from 1692.25 ms to 126.46 ms without changing the one-seam designeditor.apply with a copied implementation. That measures the harness, not the engine.editor.apply, Editor.withBatch(...), and Transforms.applyBatch(...). A benchmark seam is not a public API.Editor.withBatch(...) loops fall behind Transforms.applyBatch(...), check whether the gap is dirty-path batching before redesigning the executor. The set_node + move_node cliff collapsed once consecutive live move_node ops stopped paying dirty-path churn one op at a time.insert_node + move_node run, do not throw away the staged insert prefix and start over. Promote that prefix into the same live dirty-path batch, or manual Editor.withBatch(...) falls straight back into multi-second generic sludge.O(n²) list-clone in “cheap validation” code and then blame the tree rewrite.editor.operations order for mixed structural batches unless that order is a real public contract. split_node can synthesize internal ops, and batched execution may preserve behavior while reordering those internals.withHistory and withReact need their own combined matrix because rewrite wrappers, snapshot reads, and history recording all meet there.withDOM has its own batch-sensitive bookkeeping: pending selections, pending diffs, and node-key repair. That seam needs a dedicated matrix too, because tree equality alone will happily miss a broken DOM state cache.