docs/performance/2026-04-04-standalone-benchmark-gap-analysis.md
The new standalone benchmark lab under
benchmarks/editor did not
invalidate the earlier public apps/www numbers.
It measured a different surface.
That distinction matters because the current public guide originally read too broadly, while the underlying data was only strong for the simpler docs harness.
The public numbers in performance.mdx come from:
That public harness uses:
1000huge-mixed-block fixture from
workloads.tsHeadline result there:
10k mixed mount: 475.61 ms10k mixed mount: 529.58 msThat harness is still useful. It answers the narrower question:
Does Plate add a large tax on the simpler chunked large-document route we use in the docs perf surface?
Right now, no.
The standalone lab uses:
That mixed-markdown-10k fixture is much richer:
Headline result there:
03_mount-10k: 736.30 ms03_mount-10k: 437.60 msThat is not a contradiction. It is a different workload.
This is the main reason the gap appeared.
The old public harness mostly stressed:
The standalone rich-markdown fixture stresses:
So the standalone gap does not prove the public benchmark was fake. It proves the public benchmark was narrower.
We now have decomposition rows from the standalone lab:
| Lane | Plate | Slate | Read |
|---|---|---|---|
41_mount-10k-plain-core | 215.20 ms | 197.90 ms | same class, Plate somewhat slower |
42_mount-10k-plain-basic | 212.60 ms | 193.40 ms | same class, Plate somewhat slower |
43_mount-10k-blockquote-core | 163.90 ms | 403.00 ms | Plate faster |
44_mount-10k-blockquote-basic | 524.10 ms | 447.30 ms | Plate clearly slower |
45_mount-10k-code-core | 102.20 ms | 387.70 ms | Plate faster |
46_mount-10k-code-basic | 129.80 ms | 384.00 ms | Plate faster |
47_mount-10k-marks-core | 1227.50 ms | 925.50 ms | both expensive, Plate worse |
48_mount-10k-marks-basic | 1299.80 ms | 889.50 ms | Plate much worse |
49_mount-10k-list-markdown | 890.40 ms | 630.10 ms | Plate much worse |
86_mount-10k-bold-basic | 585.90 ms | 375.90 ms | Plate much worse |
87_mount-10k-italic-basic | 622.10 ms | 345.30 ms | Plate much worse |
88_mount-10k-underline-basic | 591.40 ms | 332.00 ms | Plate much worse |
89_mount-10k-strikethrough-basic | 580.70 ms | 334.40 ms | Plate worst in this set |
90_mount-10k-bold-single | 424.50 ms | 343.20 ms | single-plugin helps, still red |
91_mount-10k-italic-single | 422.60 ms | 339.80 ms | single-plugin helps, still red |
92_mount-10k-underline-single | 428.00 ms | 345.60 ms | single-plugin helps, still red |
93_mount-10k-strikethrough-single | 450.10 ms | 384.80 ms | single-plugin helps, still worst |
The 90..93 rows are from targeted direct standalone probes, not yet from the
main frozen batch artifact.
That changes the diagnosis:
strikethrough is the worst single mark, but not uniquely badBasicMarksPlugin to a single mark plugin helps a lot, but it
still leaves a real red gapSo the likely problem is not "Plate mount is just slow." The likely problem is:
working together on the richer markdown fixture.
The standalone fixture still feeds Slate true nested list containers:
bulleted-listnumbered-listlist-itemBut for Plate, the fixture adapter currently flattens list containers into paragraph blocks with:
indent: 1listStyleType: 'disc' | 'decimal'That still makes ListPlugin a real suspect.
It is just no longer the only suspect, because the new heavy-marks and
blockquote-basic rows also go red.
The live contract results show most edit lanes clustering around 50 ms:
10_type-middle: Plate 50.00 ms, Slate 50.10 ms29_paste-plain-text: Plate 49.90 ms, Slate 50.00 ms30_paste-html-rich-text: Plate 49.80 ms, Slate 50.30 ms34_undo-single-change: Plate 50.00 ms, Slate 50.00 msThat is not a believable story about perfect parity. It is a measurement artifact.
The current shell waits on a fixed requestAnimationFrame cadence, so the edit
lanes are effectively quantized by the browser/frame clock in headless preview.
Conclusion:
The real live story today is:
10k markdown mount laneThis is now the clearest red lane.
47_mount-10k-marks-core is already worse than Slate.
48_mount-10k-marks-basic is still dramatically worse.
The single-mark rows make the next step clearer:
585.90 ms vs 375.90 ms622.10 ms vs 345.30 ms591.40 ms vs 332.00 ms580.70 ms vs 334.40 ms424.50 ms vs 343.20 ms422.60 ms vs 339.80 ms428.00 ms vs 345.60 ms450.10 ms vs 384.80 msThat means:
BasicMarksPlugin bundle fan-out adds meaningful extra coststrikethrough is the worst measured markThat points at mark-heavy render composition as a real mount tax in the richer standalone surface.
The earlier trace was useful, but it hid one important thing:
The next direct lower-bound rows made that clearer:
| Lane | Plate | Slate | Read |
|---|---|---|---|
86_mount-10k-bold-basic | 678.10 ms | 458.20 ms | full basic-marks bundle still clearly red |
90_mount-10k-bold-single | 496.40 ms | 463.90 ms | single bold plugin is much better, but still slower |
94_mount-10k-bold-direct | 436.10 ms | 468.40 ms | direct renderLeaf lower bound is already fine |
That split is the real answer:
BasicMarksPlugin is still paying large bundle-side fan-out above the direct
bold lower bound60 ms above the direct lower boundSo the red lane is not "Plate cannot mount bold leaves fast enough." It is the work Plate chooses to do around that mount.
The DOM probe on 90_mount-10k-bold-single makes that even more explicit.
For the whole 10k bold fixture, both editors mount the same leaf topology:
10,000 <strong> nodes30,000 [data-slate-leaf] nodes30,000 [data-slate-string] nodes90,000 <span> nodesThe first paragraph is also almost the same shape:
<div data-slate-node="element" data-block-id="..." class="slate-p">...<p data-slate-node="element" style="content-visibility:auto">...So the bold gap is not caused by extra leaf DOM nodes.
It is caused by runtime work before the commit:
BasicMarksPlugin bundle fan-out
pipeRenderLeaf(...) and pipeRenderText(...) were still visiting
inactive mark renderers on every leaf/text nodepipeRenderLeaf(...) / pluginRenderLeaf(...) machinery above the
direct lower bounduseEditableProps(...) assembly are smallThat is the current exact cause statement.
The optimization order is now justified, not guessed:
pipeRenderLeaf(...)pipeRenderText(...)The kept current package cut did exactly what it was supposed to do:
pipeRenderLeaf(...) now skips inactive sibling mark rendererspipeRenderText(...) now skips inactive sibling text renderersAfter that cut, the next direct standalone probes on the Plate target changed the conclusion again.
Repeated local-preview runs for the bold rows were noisy, but the stable read was:
BasicMarksPlugin bundle lanerenderLeaf probes were not a trustworthy permanent
benchmark row for this standalone shell; they were useful diagnostics, not a
clean new headline laneThe important consequence is:
One fresh standalone rerun on the kept final state landed here:
86_mount-10k-bold-basic: Plate 639.10 ms, Slate 582.00 ms90_mount-10k-bold-single: Plate 434.60 ms, Slate 578.50 ms94_mount-10k-bold-direct: Plate 484.50 ms, Slate 544.30 msThat is noisy in the absolute numbers, but the shape is the point:
The next kept mark cut stayed inside the leaf pipe itself:
pipeRenderLeaf(...) now handles simple active leaf marks directly instead
of routing them back through pluginRenderLeaf(...)Fresh full-batch rows on the kept final state:
47_mount-10k-marks-core: Plate 1227.50 ms, Slate 925.50 ms48_mount-10k-marks-basic: Plate 1299.80 ms, Slate 889.50 ms86_mount-10k-bold-basic: Plate 585.90 ms, Slate 375.90 ms87_mount-10k-italic-basic: Plate 622.10 ms, Slate 345.30 ms88_mount-10k-underline-basic: Plate 591.40 ms, Slate 332.00 ms89_mount-10k-strikethrough-basic: Plate 580.70 ms, Slate 334.40 ms90_mount-10k-bold-single: Plate 424.50 ms, Slate 343.20 msCompared to the pre-batch standalone snapshot, that means:
48_mount-10k-marks-basic: about 1387 ms -> 1310 ms86_mount-10k-bold-basic: about 673 ms -> 597 ms90_mount-10k-bold-single: about 439 ms -> 428 msThat is a real win.
The lane is still red, but the cut moved both the heavy bundle lane and the single-mark lane in the right direction, which the earlier hook-elision batch failed to do.
The next widened decomposition also ruled out an easy special-mark scapegoat.
Fresh rows for the remaining unmeasured marks:
98_mount-10k-code-basic: Plate 418.70 ms, Slate 387.00 ms99_mount-10k-code-single: Plate 450.30 ms, Slate 365.80 ms100_mount-10k-subscript-basic: Plate 410.90 ms, Slate 382.80 ms101_mount-10k-subscript-single: Plate 395.00 ms, Slate 340.10 ms102_mount-10k-superscript-basic: Plate 395.30 ms, Slate 384.20 ms103_mount-10k-superscript-single: Plate 400.00 ms, Slate 359.10 msTake:
code is not secretly the whole problemsub, sup, strikethrough) are red, but not enough on
their own to explain the full marks-basic wallmarks-basic bill is mostly aggregate mark composition across
many marked leaves, not one hidden monster pluginSo the current optimization order becomes:
Object.keys(...).flatMap(...).sort(...) churnBasicMarksPlugin bundle composition as the next real seam only after
those shared-pipe wins are frozenThe latest kept mark cut is exactly that hybrid scan.
What changed:
That keeps the two good properties together:
Focused reruns on the kept hybrid path landed here:
48_mount-10k-marks-basic: Plate 1244.70 ms, Slate 903.00 ms86_mount-10k-bold-basic: Plate 557.20 ms, Slate 335.60 ms87_mount-10k-italic-basic: Plate 547.90 ms, Slate 339.40 ms89_mount-10k-strikethrough-basic: Plate 555.50 ms, Slate 344.50 ms90_mount-10k-bold-single: Plate 399.90 ms, Slate 342.50 ms91_mount-10k-italic-single: Plate 388.30 ms, Slate 349.90 ms93_mount-10k-strikethrough-single: Plate 439.80 ms, Slate 339.60 msThat is the cleanest remaining shared-pipe win so far.
Specifically check whether the Plate fixture adapter plus ListPlugin mount
path is doing expensive reshaping on initial value.
The likely question is:
How much of the
03_mount-10kgap is really "rich markdown", and how much is just list normalization on mount?
The 49_mount-10k-list-markdown row says list work is still very red.
So this is still high priority.
That split is no longer hypothetical. The dedicated list rows say:
49_mount-10k-list-markdown: Plate 890.40 ms, Slate 630.10 ms96_mount-10k-list-core: Plate 622.70 ms, Slate 679.20 ms97_mount-10k-list-only: Plate 848.70 ms, Slate 671.70 msThat means:
ListPlugin is the problemListPlugin adds only a small extra tax on
top of thatThe DOM probe made the reason concrete:
list-core: 0 <ul>, 0 <li>, 30,000 paragraph nodeslist-only: 30,000 <ul>, 30,000 <li>,
30,000 paragraph nodeslist-only: 0 <ul>, 0 <li>, 30,000
[role="listitem"] paragraphs10,000 <ul>, 30,000 <li>So the original list cliff was not mysterious.
Plate's original flattened-list render model created one list container per item instead of styling the paragraph element itself.
That was the exact cause.
The kept fix has three parts:
belowNodes wrappersrole="listitem"display: list-itemlistStyleTypepipeRenderElement(...) keeps the plain fast path when:
belowNodes wrappers for the current elementThat is why the lane moved:
97_mount-10k-list-only: about 1564 ms -> 849 ms49_mount-10k-list-markdown: about 1452 ms -> 890 msThe remaining list work is no longer the main embarrassment.
43_mount-10k-blockquote-core is good.
44_mount-10k-blockquote-basic is not.
That means the red tax is not blockquote itself. It is what the richer basic surface adds around it.
That makes blockquote a good focused seam for tracing element/render overhead without list complexity muddying the picture.
Before reading too much into the ~50 ms edit lanes, stop using the current
fixed-frame settle as the main contract.
Replace it with something closer to:
Until that is fixed, mount is the honest lane. Typing/paste/undo are not yet good enough to drive major optimization decisions.
The public docs should keep saying exactly what the public harness proves, not what the richer lab might eventually prove.
That means:
The surprising part was not that Plate can lose on a richer workload. The surprising part was that the simpler public docs harness sounded broad enough to hide that distinction.
The fix is not to throw out the old benchmark. The fix is:
BasicMarksPlugin fan-outstrikethrough