docs/plans/2026-04-03-performance-benchmark-spec-rewrite-plan.md
Rewrite performance-benchmark-spec.md from scratch so it stops reading like a polite public-doc contract and starts reading like a real benchmark program spec.
The target is the rigor and control surface of
js-framework-benchmark, translated into rich-text editor terms.
Not a toy Plate vs Slate note.
Not a marketing page.
A production-grade editor benchmark spec.
Use js-framework-benchmark as the primary benchmark-model reference.
Copy these traits:
Do not copy its domain assumptions.
This benchmark is for rich-text editors, not DOM row-table libraries.
Primary references:
The current spec is too high-level and too timid.
It has:
It does not have:
That is the real gap.
The rewritten spec should define five layers:
Rich-text standard plus the parts of the Slate example surface people actually care about:
Explicitly out:
The spec must be written for a future multi-editor table, not a permanent 1v1.
Near term:
Future:
That means the spec should avoid hardcoding Plate vs Slate assumptions into
the benchmark model.
The spec should mirror js-framework-benchmark by defining stable benchmark
ids and families.
These replace the row-table operations with editor operations.
They should report:
totalscriptlayoutpaintotherIf layout and other are not yet reliable in the runner, the spec should
still reserve them. Do not design a baby benchmark just because the runner is
not there yet.
01_ready-empty
02_mount-1k
1k doc03_mount-10k
10k doc04_mount-50k
50k doc05_replace-same-size
06_append-1k-to-1k07_append-5k-to-10k08_remove-single-block09_clear-document10_type-middle11_type-start12_type-end13_type-inside-marked-text14_partial-update-every-10th-block15_partial-update-every-10th-leaf16_enter-split-paragraph17_backspace-merge-block18_delete-forward-merge19_tab-indent-list-item20_shift-tab-outdent-list-item21_toggle-mark-selection22_toggle-block-format-selection23_select-single-caret24_shift-arrow-expand-inline25_shift-arrow-expand-cross-block26_mouse-drag-range27_arrow-nav-cross-block28_select-table-range29_paste-plain-text30_paste-html-rich-text31_paste-markdown-ish32_paste-large-fragment33_paste-duplicate-id-fragment34_undo-single-change35_redo-single-change36_undo-after-large-paste37_redo-after-structural-edit38_move-block-up39_move-list-item40_swap-adjacent-blocksThis should be as serious as js-framework-benchmark, not one hand-wavy
“memory after mount” bullet.
51_ready-memory
52_mount-1k-memory53_mount-10k-memory54_mount-50k-memory55_typing-churn-memory
56_paste-clear-memory
57_history-churn-memory
58_table-selection-memory
Directly analogous to the Lighthouse/startup family in
js-framework-benchmark.
61_startup-time
62_consistently-interactive
63_script-bootup
64_main-thread-work
65_first-paint66_first-contentful-paint67_editor-first-interaction-ready
71_size-uncompressed72_size-compressed73_editor-route-js74_editor-route-css75_total-byte-weightDo not reduce this to generic app bundle trivia. The spec should define which code counts as “editor route payload” and which shared shell assets are counted separately.
This is where the current work is still too soft.
The protocol matrix is not optional supporting documentation anymore. It must become a benchmark gate.
81_protocol-coverage
82_protocol-pass-rate
83_open-critical-regressions84_open-major-regressions85_family-completeness
Correctness metrics are not rolled into speed. They are a separate gate and a separate visible family.
The spec should define benchmark axes instead of pretending one 10k mixed
lane tells the truth.
Required families:
plain-paragraphsmixed-blocksheavy-marksnested-listscode-heavytablesmarkdown-docRequired sizes:
1k5k10k50kNot every family must run at every size on day one, but the spec should define the full target matrix.
collapsedexpanded-inlineexpanded-cross-blockbackward-expandedtable-rangeCopy the js-framework-benchmark control philosophy almost whole.
The spec should require:
Which editors?Which benchmarks?Which document families?Which sizes?Which browsers / runs?Display modeTrace slice modeCompare withCopy/paste selection stateHide flagged editorsShow only correctness-clean editorsRequired:
medianmeanbox-plotOptional later:
p95worstRequired target model:
totalscriptlayoutpaintotherIf the runner can only support total/script/paint at first, fine. The spec
should still define the richer target and mark the missing slices as an
implementation gap, not silently lower the standard.
The main page should be a dense results table, not a dashboard cosplay.
The spec should require:
Charts are secondary, but they should be meaningful.
Required chart families:
1k/5k/10k/50kPrune:
Those are generic-framework fluff for this repo.
This is one of the best js-framework-benchmark ideas and we should steal it.
The spec should define a benchmark issue registry:
Editors with unresolved benchmark-invalidating bugs must be:
The spec rewrite must stop being vague here.
Define:
Recommendation:
Keep the three-stage js-framework-benchmark pipeline shape:
The rewritten spec should define:
These do not map cleanly and should be removed:
Language / runtime can still appear as metadata, just not as the center of the UI.
Because editors are richer, the spec should add:
The rewritten file should use this order:
editor-benchmarks dashboard design target from dashboard-first to
table-first.Do not half-copy js-framework-benchmark.
If we use it as the reference, use the real bones:
Anything softer will look like a benchmark site and still behave like a toy.