Back to Plate

Mark bundle tax comes from inactive leaf and text renderer fan-out

docs/solutions/performance-issues/2026-04-04-mark-bundle-tax-comes-from-inactive-leaf-and-text-renderer-fan-out.md

53.0.65.1 KB
Original Source

Mark bundle tax comes from inactive leaf and text renderer fan-out

Problem

The standalone editor benchmark exposed a red 10k bold lane even after the earlier cheap-mark fast paths.

The deceptive part was that the mounted DOM was already basically the same as Slate:

  • 10,000 <strong> nodes
  • 30,000 leaf nodes
  • 30,000 string nodes
  • 90,000 <span> nodes

So the extra runtime was not a "Plate mounts more DOM" story.

Symptoms

Current standalone rows on the rich benchmark lab:

  • 86_mount-10k-bold-basic: Plate 585.90 ms, Slate 375.90 ms
  • 90_mount-10k-bold-single: Plate 424.50 ms, Slate 343.20 ms
  • 94_mount-10k-bold-direct: Plate 406.00 ms, Slate 333.90 ms

That split says:

  • the direct lower bound is already fine
  • the single mark path still pays extra runtime above that floor
  • the full basic bundle pays a much larger extra tax above the single mark path

What Didn't Work

  • Treating the earlier instrumentation as if it had proven inactive plugin work was gone. It only counted the active keyed work we instrumented.
  • Focusing on DOM size. The DOM shape was already basically equivalent.
  • Chasing strikethrough alone before proving the shared mark path.

Solution

Key the shared mark render pipes by active mark instead of visiting every mark renderer on every leaf/text node.

Kept cuts:

  • in pipeRenderLeaf.tsx, store leaf renderers by mark key and only call the renderer when that key is present on the current leaf
  • in pipeRenderText.tsx, do the same for text-side mark renderers
  • still in pipeRenderLeaf.tsx, handle simple active leaf marks directly inside the pipe instead of routing them back through pluginRenderLeaf(...)

Targeted regression coverage:

The new tests prove inactive leaf/text renderers are skipped instead of being invoked and returning the original children.

Why This Works

Before the cut:

  • pipeRenderLeaf(...) still walked every leaf renderer for every leaf
  • pipeRenderText(...) still walked every text renderer for every text node

On a 10k bold-only fixture, that means the basic mark bundle was paying for a pile of irrelevant sibling mark work on every node even though only bold was active.

After the cut, the cost splits cleanly:

  • bundle tax shrinks to the real active mark work
  • the remaining single-mark gap is the shared active mark runtime itself

The second kept cut moved those rows again:

  • 48_mount-10k-marks-basic: about 1387 ms -> 1310 ms
  • 86_mount-10k-bold-basic: about 673 ms -> 597 ms
  • 90_mount-10k-bold-single: about 439 ms -> 428 ms

The next kept cut stayed in the same shared pipes, but fixed a subtler bill:

  • the naive fast path that walked every simple mark entry on every leaf helped marked leaves and hurt plain leaves inside rich paragraphs
  • the kept version first checks whether the current leaf/text node owns any relevant mark keys
  • only then does it walk the already ordered simple or complex mark arrays

That keeps plain leaves cheap while still removing the per-render Object.keys(...).flatMap(...).sort(...) churn on marked leaves.

Focused reruns on the kept hybrid path landed here:

  • 48_mount-10k-marks-basic: Plate 1244.70 ms, Slate 903.00 ms
  • 86_mount-10k-bold-basic: Plate 557.20 ms, Slate 335.60 ms
  • 90_mount-10k-bold-single: Plate 399.90 ms, Slate 342.50 ms
  • 91_mount-10k-italic-single: Plate 388.30 ms, Slate 349.90 ms
  • 93_mount-10k-strikethrough-single: Plate 439.80 ms, Slate 339.60 ms

Widening the decomposition to the remaining special marks showed that no single plugin was secretly causing the whole remaining wall:

  • 98_mount-10k-code-basic: Plate 418.70 ms, Slate 387.00 ms
  • 100_mount-10k-subscript-basic: Plate 410.90 ms, Slate 382.80 ms
  • 102_mount-10k-superscript-basic: Plate 395.30 ms, Slate 384.20 ms

So the remaining marks-basic lane is mostly the aggregate cost of many marked leaves in the same workload, not one hidden special-mark disaster.

That gives a clean optimization order:

  1. remove inactive bundle fan-out
  2. then attack the active shared mark path if the single-mark lane is still red

Prevention

  • When a benchmark says "bundle is much slower than single", check whether the runtime still visits inactive sibling renderers.
  • Do not use DOM-size parity as proof that runtime overhead is gone.
  • For rich editor performance work, always keep three rows:
    • bundle
    • single plugin
    • direct lower bound