docs/plans/2026-06-01-core-rich-text-operations-ar-perf.md
Objective:
Optimize core-rich-text-operations-compare under Slate AR until target
evidence is green, plateaued, or blocked by correctness/architecture proof.
Goal plan: docs/plans/2026-06-01-core-rich-text-operations-ar-perf.md
Template: docs/plans/templates/task.md
Primary template: docs/plans/templates/task.md
Applied packs:
Task source:
core-rich-text-operations-compareCompletion threshold:
core-rich-text-operations-compare has fresh target-backed AR
evidence for rich_text_structural_ops_p95_ms, and one of these is true:
metric is below the promotion target, at least two correctness-green packets
produce less than 5% further gain, or the remaining owner is explicitly
blocked by correctness/API architecture evidence. Promotion target from the
registry is below 3x legacy on the structural composite.node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-core-rich-text-operations-ar-perf.md passes.Verification surface:
pnpm bench:targets:check, pnpm bench:targets:dry-run -- core-rich-text-operations-compare,
and pnpm slate:ar:setup-target -- core-rich-text-operations-compare.RICH_TEXT_OPS_COMPARE_ITERATIONS=51 bun run bench:core:rich-text-operations:compare:local
through the target or AR runner.bun check in .tmp/slate-v2 when a packet is considered keep.Constraints:
Boundaries:
benchmarks/targets/slate-v2.json target
core-rich-text-operations-compare plus .tmp/slate-v2/autoresearch.*..tmp/slate-v2 core/runtime files if profiling points to
a safe owner, benchmark target/session artifacts, and this plan.Output budget strategy:
rg/sed reads. Keep large benchmark
detail in JSON artifacts and report only metrics, deltas, and blockers.Blocked condition:
bun check fails from unrelated existing work and the
failing owner cannot be isolated, or if the next improvement requires a public
API/architecture decision outside the current target.Task state:
Current verdict:
bun check passing.Completion rule:
update_goal(status: complete) while any required checklist item
remains unchecked. If an item does not apply, check it and add N/A: <reason>.update_goal(status: complete) until every completion threshold
above is satisfied, final handoff evidence is recorded, and
node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-core-rich-text-operations-ar-perf.md passes.Start Gates:
| Gate | Applies | Evidence |
|---|---|---|
| Skill analysis before edits | yes | Loaded autogoal, slate-ar-perf, and slate-ar. |
| Active goal checked or created | yes | get_goal returned none; created this goal. |
| Source of truth read before edits | yes | Read target core-rich-text-operations-compare from benchmarks/targets/slate-v2.json. |
| Tracker comments and attachments read | no | N/A: no tracker item or attachment. |
| Video transcript evidence required | no | N/A: no video evidence in this task. |
docs/solutions checked for non-trivial existing-code work | no | N/A: no code owner was changed; target was already green. |
| TDD decision before behavior change or bug fix | no | N/A: no behavior patch was needed. |
| Branch decision for code-changing task | no | N/A: no branch/commit/PR requested. |
| Release artifact decision | yes | No release artifact: no runtime/package source change was kept. |
| Browser tool decision for browser surface | no | N/A unless a browser/React regression appears. |
| PR expectation decision | yes | No PR requested. |
| Tracker sync expectation decision | yes | No tracker sync requested. |
| Output budget strategy recorded | yes | Recorded above. |
| Agent-native pack selected | yes | Target/AR package scripts are agent-facing workflow surfaces. |
| Agent-facing action surface identified | yes | bench:targets:*, slate:ar:*, and .tmp/slate-v2 benchmark scripts. |
| Source rule versus generated mirror boundary identified | yes | Target registry/session files are source for this loop; no generated skill mirror touched. |
agent-native-reviewer loaded or waiver recorded | no | N/A: no agent rule/skill/tool source changed; only existing AR session files were pointed at the selected target. |
| Package/API pack selected | yes | Possible package runtime perf changes in .tmp/slate-v2/packages/**. |
| Public surface or package boundary identified | yes | Potential package runtime only; no public API planned. |
| Release artifact path selected | yes | N/A: no published user-visible delta. |
changeset skill loaded when .changeset is required | no | N/A: no changeset required. |
| Barrel/export impact decision recorded | no | N/A: no exports or file layout changed. |
Work Checklist:
<video-transcripts> XML, or marked N/A with reason..agents/**, .claude/**,
.codex/**, skills, hooks, commands, prompts, or user-action tooling..agents/rules/** changed, or N/A reason is recorded..changeset, registry changelog, or explicit no-artifact reason..changeset work loads changeset and follows its package/version/prose rules.docs/components/changelog.mdx instead of adding a package changeset.main.Completion Gates:
| Gate | Applies | Required action | Evidence |
|---|---|---|---|
| Named verification threshold | yes | Run the target setup, benchmark, promotion repeat, checks, and state proof named in this plan. | Target setup OK; benchmark lint emitted primary metric; packet 4 was 1.03ms; promotion repeats were 1.6ms and 0.94ms; all packet checks passed. |
| Bug reproduced before fix | no | Record N/A with reason. | N/A: no product bug was fixed. |
| Targeted behavior verification | yes | Run target benchmark plus correctness gate. | pnpm slate:ar:next packets ran bash ./autoresearch.sh and bash ./autoresearch.checks.sh; checks passed each time. |
| TypeScript or typed config changed | no | Record N/A with reason. | N/A: no TS or typed config source changed. |
| Package exports or file layout changed | no | Record N/A with reason. | N/A: no exports or file layout changed. |
| Package manifests, lockfile, or install graph changed | no | Record N/A with reason. | N/A: no manifest, lockfile, or install graph changed. |
| Agent rules or skills changed | no | Record N/A with reason. | N/A: no agent source changed. |
| Workspace authority proof | yes | Run proof in the owning workspace. | plate-2 ran target commands; .tmp/slate-v2 ran benchmark/check wrappers. |
| Browser surface changed | no | Record N/A with reason. | N/A: no browser behavior changed. |
| Browser final proof | no | Record N/A with reason. | N/A: no browser proof required. |
| CI-controlled template output changed | no | Record N/A with reason. | N/A: no templates/** touched. |
| Package behavior or public API changed | no | Record N/A with reason. | N/A: no runtime patch was kept. |
| Registry-only component work changed | no | Record N/A with reason. | N/A: no registry component work. |
| Docs or content changed | yes | Verify source-backed plan claims. | This plan records exact command evidence; no docs claim depends on runtime rendering. |
| High-risk mini gate | yes | Record failure mode and proof plan. | Risk was optimizing wrong target due stale AR wrapper; benchmark-lint caught it, wrapper files were corrected, new segment was created, and promotion repeats passed. |
| Agent-native review for agent/tooling changes | no | Record N/A with reason. | N/A: no .agents/**, .claude/**, .codex/**, skill, hook, command, or prompt source changed. |
| Local install corruption suspected | no | Record N/A with reason. | N/A: no install corruption signal appeared. |
| Autoreview for non-trivial implementation changes | no | Record N/A with reason. | N/A: no product/runtime code patch was kept; session and plan changes are mechanically verified. |
| PR create or update | no | Record N/A with reason. | N/A: no PR requested. |
| Task-style PR body verified | no | Record N/A with reason. | N/A: no PR requested. |
| PR proof image hosting | no | Record N/A with reason. | N/A: no PR/browser proof image. |
| Tracker sync-back | no | Record N/A with reason. | N/A: no tracker item. |
| Final handoff contract | yes | Fill final handoff fields. | Final handoff fields below record outcome, caveat, design, and verification. |
| Final lint | no | Record N/A with reason. | N/A: no product code changed; benchmark/check wrappers executed successfully. |
| Output budget discipline | yes | Verify output stayed bounded. | Long benchmark output was summarized by AR tails and metrics; tool output was capped. |
| Goal plan complete | yes | Run node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-core-rich-text-operations-ar-perf.md. | Run after this plan is filled. |
| Agent source / generated sync | no | Record N/A with reason. | N/A: no .agents/rules/** changed. |
| Agent action discoverability | yes | Source-audit action surface. | bench:targets:*, slate:ar:*, .tmp/slate-v2/autoresearch.sh, and .tmp/slate-v2/autoresearch.checks.sh identify the target loop. |
| Agent-native review | no | Record N/A with reason. | N/A: no agent source changed. |
| Public API / package boundary proof | yes | Source-audit boundary. | No public package API/export changed; no code patch kept. |
| Release artifact classification | yes | Record classification. | Internal AR/session/plan evidence only. |
| Published package changeset | no | Record N/A with reason. | N/A: no published package delta. |
| Registry changelog | no | Record N/A with reason. | N/A: no registry component work. |
| No release artifact | yes | Record exact reason. | Internal-only measurement/session update; no user-visible package behavior/API/types/config/runtime delta. |
| Package typecheck/build/test | yes | Run owning package check or N/A. | bun check ran as the AR correctness command for packets 4, 5, and 6 and passed. |
| Barrel/export generation | no | Record N/A with reason. | N/A: no exports changed. |
Phase / pass table:
| Phase | Status | Evidence | Next |
|---|---|---|---|
| Intake and source read | complete | Target contract and AR skills read. | implementation |
| Implementation | complete | Corrected stale AR session wrappers and started target/promotion segments. | verification |
| Verification | complete | Benchmark lint, packet 4, promotion repeats 5/6, and slate:ar:state passed. | closeout |
| PR / tracker sync | N/A | No PR or tracker sync requested. | final response |
| Closeout | complete | Plan filled; autogoal check runs last. | final response |
Findings:
pnpm slate:ar:init-target updated config but kept stale pagination wrappers;
benchmark-lint caught this by emitting pagination metrics instead of
rich_text_structural_ops_p95_ms.1.6ms / p95 ratio 2.36, then 0.94ms /
p95 ratio 1.85; both passed bun check.Decisions and tradeoffs:
measure, not keep, because there was no
product patch to commit. keep would have been fake here.Implementation notes:
.tmp/slate-v2/autoresearch.sh to run
RICH_TEXT_OPS_COMPARE_ITERATIONS=51 bun run bench:core:rich-text-operations:compare:local..tmp/slate-v2/autoresearch.checks.sh to run bun check..tmp/slate-v2/autoresearch.md so the session text matches the core
rich-text target.Review fixes:
Error attempts:
| Error / failed attempt | Count | Next different move | Resolution |
|---|---|---|---|
| Initial benchmark-lint emitted pagination metrics | 1 | Patch stale AR wrapper files and start a new rich-text segment | Resolved; benchmark-lint then emitted rich_text_structural_ops_p95_ms=0.98 |
Verification evidence:
pnpm bench:targets:check: benchmark-targets ok: 26 targets.pnpm bench:targets:dry-run -- core-rich-text-operations-compare: setup OK
for rich_text_structural_ops_p95_ms.pnpm slate:ar:setup-target -- core-rich-text-operations-compare: detected
current metric was still pagination and recommended setup.pnpm slate:ar:init-target -- core-rich-text-operations-compare: initialized
rich-text config.pnpm slate:ar:benchmark-lint: failed because stale wrapper emitted
pagination metrics; fixed wrapper/session files.node ... autoresearch.mjs new-segment --cwd .tmp/slate-v2 --reason "Switch active target from pagination to core-rich-text-operations-compare" --yes: segment 2 created.pnpm slate:ar:benchmark-lint: OK,
rich_text_structural_ops_p95_ms=0.98, worst p95 ratio 2.46, worst mean
ratio 2.27.pnpm slate:ar:next packet 4: 1.03ms, worst p95 ratio 2.35, worst mean
ratio 3.17, bun check passed.measure; no commit created.node ... promote-gate --cwd .tmp/slate-v2 --reason "Core rich-text baseline is below the <3x legacy p95 promotion target" --gate-name "rich-text promotion repeat" --query-count 2 --yes: promotion segment 3 created.1.6ms, worst p95 ratio 2.36, worst mean ratio
3.75, bun check passed; logged accepted measure.0.94ms, worst p95 ratio 1.85, worst mean ratio
1.92, bun check passed; logged accepted measure.pnpm slate:ar:state: active target
core-rich-text-operations-compare; segment 3 has 2 measured, 0 checks
failed; researchIntegrity OK; next hint says close target as green.Final handoff contract:
bun check in
each packet, and slate:ar:state.core-rich-text-operations-compare is green under the target's p95
ratio threshold.Task-style PR body contract:
<!-- auto-release:start --> block. If a changeset is
part of the diff and repo policy expects auto release, include that block.๐ Fixes #123 or ๐ Fixes โ N/A, then
an emoji confidence line like ๐ข 95-100% confidence.| Phase | ๐งช Tests | ๐ Browser |.Reproduced and Verified rows. Mark passing proof with ๐ข, repro or
failing proof with ๐ด, and non-applicable cells with โ N/A.**โ
Outcome**, **โ ๏ธ Caveat**,
**๐๏ธ Design**, and **๐งช Verified**.Summary / Verification PR body, an
adaptive prose body from a git helper skill, plain ## Outcome sections, or
an unrelated generated badge footer unless the caller or repo template
explicitly asks for it.gh pr view --json body output or a concise source-backed summary
of that output.Final handoff / sync:
Timeline:
1.03ms with bun check green.1.6ms with bun check green.0.94ms with bun check green.Reboot status:
| Question | Answer |
|---|---|
| Where am I? | Closeout |
| Where am I going? | Final response after completion check |
| What is the goal? | Optimize or prove green core-rich-text-operations-compare under Slate AR |
| What have I learned? | It is already green under p95 target; stale AR wrappers were the only setup bug |
| What have I done? | Fixed target wrappers, ran baseline and promotion repeats, logged accepted measurements |
Open risks: