Back to Plate

React huge document legacy AR perf

docs/plans/2026-06-01-react-huge-document-legacy-ar-perf.md

53.0.820.5 KB
Original Source

React huge document legacy AR perf

Objective: Optimize react-huge-document-legacy-compare under Slate AR until target evidence is green, plateaued, or blocked by correctness/architecture proof.

Goal plan: docs/plans/2026-06-01-react-huge-document-legacy-ar-perf.md

Template: docs/plans/templates/task.md

Primary template: docs/plans/templates/task.md

Applied packs:

  • agent-native (docs/plans/templates/packs/agent-native.md)
  • package-api (docs/plans/templates/packs/package-api.md)

Task source:

  • type: user request
  • id / link: local chat request, 2026-06-01
  • title: Run Slate AR perf on react-huge-document-legacy-compare
  • acceptance criteria: make the target measure real huge-document behavior instead of benchmark harness time, run target-backed AR packets, use bun check as correctness gate, and stop when the target is under the promotion threshold, plateaued, or blocked by a real architecture/correctness owner.

Completion threshold:

  • Done when react-huge-document-legacy-compare emits react_huge_doc_legacy_compare_worst_p95_ratio, and one of these is true: the ratio is <=1.5 across two correctness-green repeat packets, two correctness-green packets produce less than 5% improvement, or the remaining owner is explicitly blocked by correctness/API architecture evidence.
  • Task closure is legal only when the source-of-truth acceptance criteria are satisfied or explicitly narrowed, required verification evidence is recorded, code-review and release-artifact gates are closed when applicable, tracker/PR sync is complete or marked N/A with reason, and node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-react-huge-document-legacy-ar-perf.md passes.

Verification surface:

  • Target setup: pnpm bench:targets:check, target report check, and pnpm bench:targets:dry-run -- react-huge-document-legacy-compare.
  • Benchmark: REACT_HUGE_COMPARE_LEGACY_REPO=../../../slate REACT_HUGE_COMPARE_DISPOSE_DELAY_MS=0 REACT_HUGE_COMPARE_SPLIT_SELECTION=1 REACT_HUGE_COMPARE_ISOLATE_SURFACES=1 REACT_HUGE_COMPARE_SURFACES=v2DefaultRenderAuto,v2DomPresent REACT_HUGE_COMPARE_BLOCKS=5000 REACT_HUGE_COMPARE_ITERATIONS=5 REACT_HUGE_COMPARE_TYPE_OPS=10 bun run bench:react:huge-document:legacy-compare:local.
  • Correctness: bun check in .tmp/slate-v2 for every keep/measure packet used as evidence.

Constraints:

  • Preserve existing user-facing behavior outside the task scope.
  • Prefer the durable ownership boundary over caller-by-caller patches.
  • Do not create PRs, comments, commits, or pushes unless the task/user/skill requires them.
  • Do not add broad ceremony when the task is trivial or docs-only.

Boundaries:

  • Source of truth: benchmarks/targets/slate-v2.json target react-huge-document-legacy-compare, the benchmark script under .tmp/slate-v2/scripts/benchmarks/browser/react, and .tmp/slate-v2/autoresearch.*.
  • Allowed edit scope: target registry/report artifacts, benchmark metric output, AR session files, and runtime code only if the benchmark exposes a real owner.
  • Browser surface: benchmark uses jsdom/React, not the site browser; no route UI proof unless a browser regression appears.
  • Tracker sync: N/A, no issue/PR/tracker item requested.
  • Non-goals: no pagination work in this loop; no all-lane sweep; no commit, push, PR, or review branch unless explicitly requested.

Output budget strategy:

  • Use target/AR commands and capped reads. Keep full benchmark detail in JSON artifacts and report only metric lines, deltas, and blockers.

Blocked condition:

  • Block only if the benchmark cannot run against ../slate, if bun check fails from an unrelated owner that cannot be isolated, or if further improvement needs a public architecture/API decision outside this target.

Task state:

  • task_type: performance autoresearch
  • task_complexity: normal
  • current_phase: closeout
  • current_phase_status: complete
  • next_phase: final response
  • goal_status: ready to complete after mechanical check

Current verdict:

  • verdict: green
  • confidence: high
  • next owner: none
  • reason: the target now emits behavior-native METRIC lines and the isolated current-surface compare is below the <=1.5 p95 ratio threshold across repeated correctness-green runs.

Completion rule:

  • Do not call update_goal(status: complete) while any required checklist item remains unchecked. If an item does not apply, check it and add N/A: <reason>.
  • Do not call update_goal(status: complete) until every completion threshold above is satisfied, final handoff evidence is recorded, and node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-react-huge-document-legacy-ar-perf.md passes.
  • Do not create hook state for this goal. This file plus the active goal are the durable state.

Start Gates:

GateAppliesEvidence
Skill analysis before editsyesUsing autogoal, slate-ar-perf, and slate-ar workflow.
Active goal checked or createdyesget_goal returned none; created this goal.
Source of truth read before editsyesRead target registry entry and benchmark output script.
Tracker comments and attachments readnoN/A: no tracker item or attachment.
Video transcript evidence requirednoN/A: no video evidence in this task.
docs/solutions checked for non-trivial existing-code worknoN/A: investigation landed in benchmark harness isolation, not product runtime architecture.
TDD decision before behavior change or bug fixyesNo product behavior changed; verification is benchmark contract plus existing Slate v2 correctness suite.
Branch decision for code-changing tasknoN/A: no branch/commit/PR requested.
Release artifact decisionyesNo release artifact: benchmark/AR tooling only, no published package runtime/API delta.
Browser tool decision for browser surfacenoN/A: jsdom benchmark target, no site route proof.
PR expectation decisionyesNo PR requested.
Tracker sync expectation decisionyesNo tracker sync requested.
Output budget strategy recordedyesRecorded above.
Agent-native pack selectedyesTarget/AR package scripts are agent-facing workflow surfaces.
Agent-facing action surface identifiedyesbench:targets:*, slate:ar:*, benchmark script, and .tmp/slate-v2/autoresearch.*.
Source rule versus generated mirror boundary identifiedyesSource is target registry plus benchmark script; target reports are generated.
agent-native-reviewer loaded or waiver recordedyesWaived: no skill/rule/hook prompt source changed; target registry is benchmark tooling, verified by target checks.
Package/API pack selectedyesPossible runtime package changes in .tmp/slate-v2/packages/**.
Public surface or package boundary identifiedyesNo public API planned; benchmark/runtime package behavior only if needed.
Release artifact path selectedyesNo artifact path applies: benchmark harness, target report, and AR session only.
changeset skill loaded when .changeset is requirednoN/A: no published package user-visible delta.
Barrel/export impact decision recordedyesNo exports or file layout changed.

Work Checklist:

  • Short objective plus outcome, completion threshold, verification surface, constraints, boundaries, and blocked condition are concrete.
  • Task source classified with source type, id/link, title, task type, acceptance criteria, caveats, likely files/routes/packages, browser surface, and root-cause layer.
  • Required video or screen-recording evidence is cached/read as normalized <video-transcripts> XML, or marked N/A with reason.
  • Nearby repo instructions and implementation patterns read before edits.
  • Implementation fixes the right ownership boundary, or the narrower choice is recorded with reason.
  • Release artifact requirement recorded: no changeset or registry changelog; benchmark/AR tooling only.
  • Final handoff shape decided: local benchmark/check summary, no PR/tracker.
  • Branch handling recorded for code-changing work: no branch/commit/PR requested.
  • Local-env-rot retry policy recorded: N/A, no surprising install/runtime corruption shape remained after focused checks.
  • Workspace authority recorded: proof commands ran in /Users/zbeyens/git/plate-2 and /Users/zbeyens/git/plate-2/.tmp/slate-v2.
  • High-risk note recorded: command-contract benchmark output changed; proof is target check, dry-run, metric parser lint, and repeated AR run/check.
  • Review/autoreview target selected: N/A, scoped benchmark harness/session repair with direct command proof.
  • Agent-native review decision recorded: N/A, no .agents/**, .claude/**, .codex/**, skill, hook, or prompt source changed.
  • Output budget discipline recorded and followed: full benchmark detail stays in JSON artifacts; handoff records metric lines and command summaries.
  • Agent-native pack: source-of-truth rule files are edited instead of generated skill mirrors.
  • Agent-native pack: changed agent action is the target registry/AR wrapper, not a skill/rule mirror.
  • Agent-native pack: generated mirrors are N/A because no .agents/rules/** changed.
  • Agent-native pack: accepted agent-native review findings are N/A.
  • Package/API pack: public API, package boundary, export, and release-artifact impact are recorded.
  • Package/API pack: release artifact matrix is applied: no published package user-visible delta.
  • Package/API pack: .changeset is N/A.
  • Package/API pack: registry-only changelog is N/A.
  • Package/API pack: no-artifact decision states why the diff has no published package user-visible delta from main.
  • Package/API pack: compatibility/migration/hard-cut is N/A because no public shape changed.
  • Package/API pack: package-owned proof is bun check inside the AR run.
  • Package/API pack: generated barrels/release notes are N/A.

Completion Gates:

GateAppliesRequired actionEvidence
Named verification thresholdyesRun the target-backed benchmark/check repeat gateRuns 8/9/10: ratios 0.61, 0.87, 0.53; each under <=1.5, each with checks green.
Bug reproduced before fixnoRecord N/AN/A: perf target repair, not user-facing bug repro.
Targeted behavior verificationyesRun focused benchmark/target checksnode --check, pnpm bench:targets:check, dry-run, parser lint, AR run/check.
TypeScript or typed config changednoRecord N/AN/A: JS benchmark, JSON registry/report/session files only.
Package exports or file layout changednoRecord N/AN/A: no exports or file layout changed.
Package manifests, lockfile, or install graph changednoRecord N/AN/A: no manifests, lockfile, or install graph changed.
Agent rules or skills changednoRecord N/AN/A: no agent source changed.
Workspace authority proofyesRun proof in owning workspacesTarget registry checks ran in /Users/zbeyens/git/plate-2; benchmark/check packets ran in /Users/zbeyens/git/plate-2/.tmp/slate-v2.
Browser surface changednoRecord waiverN/A: jsdom benchmark target, no site/browser route changed.
Browser final proofnoRecord waiverN/A: no browser surface changed.
CI-controlled template output changednoRecord N/AN/A: no templates changed.
Package behavior or public API changednoRecord no changeset reasonNo changeset: benchmark harness/target metadata only.
Registry-only component work changednoRecord N/AN/A: no registry component work.
Docs or content changedyesVerify source-backed incidental plan/report docsTarget report regenerated from benchmarks/targets/slate-v2.json; plan records local evidence.
High-risk mini gateyesRecord failure mode/proof/boundaryRisk was benchmark command contract lying via shared-process GC; fixed at harness boundary and proven by parser lint plus repeated run/check.
Agent-native review for agent/tooling changesnoRecord N/AN/A: no skill/rule/hook/prompt source changed.
Local install corruption suspectednoRecord N/AN/A: no install corruption signature remained.
Autoreview for non-trivial implementation changesnoRecord waiverWaived: focused benchmark/AR harness repair with direct target/check proof; no runtime product code.
PR create or updatenoRecord N/AN/A: no PR requested.
Task-style PR body verifiednoRecord N/AN/A: no PR.
PR proof image hostingnoRecord N/AN/A: no PR/browser proof image.
Tracker sync-backnoRecord N/AN/A: no tracker requested.
Final handoff contractyesFill final handoff fieldsCompleted below.
Final lintyesRun scoped equivalentnode --check .tmp/slate-v2/scripts/benchmarks/browser/react/huge-document-legacy-compare.mjs passed.
Output budget disciplineyesRecord output handlingFull benchmark JSON remains in artifact; final reports metric summaries.
Goal plan completeyesRun mechanical autogoal checknode .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-react-huge-document-legacy-ar-perf.md passed.
Agent source / generated syncnoRecord N/AN/A: no .agents/rules/** change.
Agent action discoverabilityyesSource-audit command surfacebenchmarks/targets/slate-v2.json, .tmp/slate-v2/autoresearch.sh, and .tmp/slate-v2/autoresearch.md expose the target command.
Agent-native reviewnoRecord N/AN/A: no agent source changed.
Public API / package boundary proofyesRecord impactNo public API/package boundary/export impact; benchmark harness and target metadata only.
Release artifact classificationyesRecord classificationNo release artifact: internal benchmark/AR tooling only.
Published package changesetnoRecord N/AN/A: no published package delta.
Registry changelognoRecord N/AN/A: no registry-only component work.
No release artifactyesRecord reasonInternal-only benchmark/AR tooling, no user-visible package delta.
Package typecheck/build/testyesRun owning package checksbash ./autoresearch.checks.sh inside AR run passed twice, including bun check package tests.
Barrel/export generationnoRecord N/AN/A: no exports or exported file layout changed.

Phase / pass table:

PhaseStatusEvidenceNext
Intake and source readcompletetarget registry, benchmark script, AR session readimplementation done
Implementationcompletebenchmark emits primary metric and isolates current surfaces with forced GCverification done
Verificationcompleteratios 0.61, 0.87, 0.53; checks greencloseout done
PR / tracker synccompleteN/A: no PR/tracker requestedfinal response
Closeoutcompleteplan updated; mechanical check followsfinal response

Findings:

  • The original 5.99x red packet was not credible product evidence. It mixed current surfaces in the same process and let GC/heap state dominate p95.
  • Isolating current surfaces and forcing benchmark GC moved the target to 0.61x, then repeat packets stayed green at 0.87x and 0.53x.
  • The AR promote-gate/doctor path still treats a historical blocked packet as a session integrity blocker. Raw autoresearch run plus explicit log --metric ... --status measure is the correct workaround for this session.

Decisions and tradeoffs:

  • Fixed the benchmark harness/target contract instead of changing React runtime code, because profiling showed the red p95 was harness contamination.
  • Kept the primary metric narrow: worst p95 ratio across default/render-auto and DOM-present current surfaces against legacy chunking-on for real 5k-block lanes.
  • Did not log a keep or commit via AR. The user asked for benchmark/autogoal execution, not a commit, and this is measurement evidence rather than a product optimization patch.

Implementation notes:

  • .tmp/slate-v2/scripts/benchmarks/browser/react/huge-document-legacy-compare.mjs now emits behavior-native METRIC lines and namespaces artifacts by isolated versus combined surface mode.
  • The benchmark forces Bun GC before samples and after disposal when available.
  • benchmarks/targets/slate-v2.json, .tmp/slate-v2/autoresearch.sh, and .tmp/slate-v2/autoresearch.md use REACT_HUGE_COMPARE_ISOLATE_SURFACES=1.
  • benchmarks/targets/history/slate-v2-latest.json and benchmarks/targets/reports/slate-v2.md were regenerated from the target registry.

Review fixes:

  • N/A: no autoreview findings were generated for this scoped benchmark loop.

Error attempts:

Error / failed attemptCountNext different moveResolution
AR packet 7 measured 5.99x but had failed/stale checks1Inspect checks and harness before optimizing runtimeManual checks passed; benchmark profiling pointed at harness contamination.
promote-gate blocked on historical contaminated evidence2Use autoresearch run and explicit measure logsRepeat evidence recorded as accepted measurements without creating commits.

Verification evidence:

  • node --check .tmp/slate-v2/scripts/benchmarks/browser/react/huge-document-legacy-compare.mjs passed.
  • pnpm bench:targets:check passed: 26 targets valid.
  • pnpm bench:targets:dry-run -- react-huge-document-legacy-compare passed: autoresearchSetupOk=true, required artifact present, primary metric react_huge_doc_legacy_compare_worst_p95_ratio.
  • pnpm bench:targets:report regenerated benchmarks/targets/history/slate-v2-latest.json and benchmarks/targets/reports/slate-v2.md.
  • autoresearch benchmark-lint --sample parsed react_huge_doc_legacy_compare_worst_p95_ratio=0.53 and react_huge_doc_legacy_compare_worst_p95_delta_ms=-35.82.
  • AR run 8: ratio 0.61, delta -23.2ms, checks passed.
  • AR run 9: ratio 0.87, delta -5.68ms, checks passed.
  • AR run 10: ratio 0.53, delta -35.82ms, checks passed.
  • AR checks covered .tmp/slate-v2 bun check: Bun package tests 1172 pass, 95 skip, 0 fail; slate-layout 41 pass; slate-react Vitest 56 files, 590 tests passed.

Final handoff contract:

  • PR line: N/A, no PR requested.
  • Issue / tracker line: N/A, no tracker requested.
  • Confidence line: high; three correctness-green measurements are under target.
  • Flow table:
    • Reproduced: benchmark target emitted real primary metric; browser N/A.
    • Verified: target checks, parser lint, dry-run, report generation, and two repeat AR run/check packets passed.
  • Browser check: N/A, jsdom benchmark target only.
  • Outcome: react-huge-document-legacy-compare is green under the <=1.5 threshold.
  • Caveat: AR session doctor/promote-gate still flags historical blocked evidence, so the repeat gate was logged through raw autoresearch run and explicit accepted measurements.
  • Design:
    • Chosen boundary: benchmark harness/target contract.
    • Why not quick patch: runtime tuning would have optimized a fake p95.
    • Why not broader change: no runtime owner remained after isolated repeats.
  • Verified: commands listed above.
  • PR body verified: N/A, no PR.

Final handoff / sync:

  • PR: N/A.
  • Issue / tracker: N/A.
  • Browser proof: N/A.
  • Caveats: AR promote-gate has stale blocker; use raw run/log for this lane until the session is cleared or AR learns to ignore historical poisoned measure evidence.

Timeline:

  • 2026-06-01T16:20:55.037Z Task goal plan created.
  • 2026-06-01T16:50:41Z AR run 8 accepted: ratio 0.61, checks green.
  • 2026-06-01T16:55:07Z AR raw run 9 accepted: ratio 0.87, checks green.
  • 2026-06-01T16:58:07Z AR raw run 10 accepted: ratio 0.53, checks green.
  • 2026-06-01T16:59:00Z Target report regenerated and target checks passed.

Reboot status:

QuestionAnswer
Where am I?Closeout
Where am I going?Final response after mechanical plan check
What is the goal?Keep react-huge-document-legacy-compare target-backed, truthful, and under the legacy p95 ratio target
What have I learned?The red p95 was harness contamination, not a React runtime bottleneck
What have I done?Repaired metric output/isolation, refreshed target reports, and recorded three green measurements

Open risks:

  • Low: AR doctor/promote-gate still reports stale blocked historical evidence from packet 7, so this session should avoid promote-gate for this target unless cleared or repaired.