.agents/skills/slate-ar-perf/SKILL.md
Handle $ARGUMENTS.
Use this for Slate v2 performance work where "try another optimization" needs a
measured loop, not another plan essay. This is the performance lane on top of
slate-ar, which wraps codex-autoresearch:codex-autoresearch.
Do not duplicate generic packet/dashboard/finalization mechanics here. Load
slate-ar for Slate wrapper behavior and codex-autoresearch:codex-autoresearch
for the underlying Autoresearch state machine.
slate-ar-perf.fast, fastest, max perf, pagination,
virtualization, benchmark, or asks to make a Slate v2 surface faster.METRIC name=value.slate-patch.slate-ar-recipe for target discovery or slate-plan for architecture
framing.slate-plan.fast, fastest, max perf, make it fastest: fastest-safe mode. Pick or
resume the matching target and keep running packets until target parity,
plateau, correctness blocker, architecture blocker, unsafe
finalization/dirty-tree boundary, or user interruption.pagination, virtualization: use the pagination default contract unless the
user gives a sharper target.continue, resume, status, dashboard, finalize: delegate to
slate-ar operator modes, then apply perf policy to any next packet.Plateau means three consecutive valid correctness-green packets improve the primary metric by less than 5% and no safe P0/P1 profiler hypothesis remains. Do not stop at the first win.
Use benchmarks/targets/slate-v2.json as the migration spine when it exists.
It is the source of truth for benchmark questions, cohorts, commands, metrics,
correctness checks, artifacts, and supporting docs.
Default path:
pnpm bench:targets:list;pnpm bench:targets:check;pnpm bench:targets:report;pnpm bench:targets:dry-run -- <target-id>;.tmp/slate-v2/autoresearch.* session only when needed:
node tooling/scripts/bench-targets.mjs autoresearch-init <target-id>;slate-ar / Codex Autoresearch for setup inspection, benchmark lint,
checks inspection, packets, stale-run detection,
ASI, dashboard, keep/discard decisions, and final evidence.A specific target id is enough instruction. Do not force the user to write a long prompt like "create target if missing".
If <target-id> is missing from benchmarks/targets/slate-v2.json and the name
is specific enough to infer the surface, create the first-class target contract
in the same pass. Examples of specific-enough names:
react-huge-document-select-allcore-observation-comparepagination-virtualized-fast-scrollhistory-fragment-undo-redoWhen creating a target:
METRIC lines from the start;benchmark_seconds;../slate/../../../slate when legacy parity is the claim;pnpm bench:targets:check before AR init.If the target name is ambiguous, stop and recommend one or two concrete target
ids. Do not create another slate-ar-* wrapper skill for a missing benchmark
target. slate-ar-perf owns perf-target bootstrapping.
The old Slate v2 bench:* package scripts remain workload owners during
migration. The clean split is: target registry owns the decision contract,
benchmark scripts own runtime workload, Autoresearch owns active optimization
state, and target reports/history own historical status.
Performance wins do not count when the editor is less correct.
Before pagination, virtualization, hidden DOM, model-backed selection, or staged-render optimization:
slate-patch or tdd;checks_failed or discard,
never keep.For pagination or page-level virtualization, start from this contract unless the user gives a sharper one.
Target route:
http://localhost:3100/examples/pagination?page_layout=single&strategy=virtualized&rows=800
Required cohorts:
rows=8, staged and virtualizedrows=500, staged and virtualizedrows=800 or rows=1000, virtualizedPrimary metrics, lower is better:
Secondary metrics:
Correctness checks:
Benchmark output must print METRIC lines. Example:
METRIC typing_p95_ms=42
METRIC dropped_chars=0
METRIC dom_nodes=1840
For react-huge-document-select-all, create or use a target with this contract.
Required cohorts:
Mod+A), not only programmatic model selection.Primary metrics, lower is better:
react_huge_doc_select_all_p95_ms;react_huge_doc_select_all_worst_p95_ratio versus legacy;react_huge_doc_select_all_failure_count.Secondary metrics:
Correctness checks:
Mod+A selects the full editor document;Promotion target:
<=1.5 with failure count 0;<=1.0 or plateau after three
correctness-green packets with less than 5% gain and no safe P0/P1 profiler
hypothesis left.Report: