Slate AR Gate

Handle $ARGUMENTS by loading slate-ar and running a measured gate loop.

Use this when the question is: "Can this existing proof surface pass repeatably, and what does the failure evidence say?"

It is valid for full editor behavior testing: navigation, typing, selection, clipboard, IME, focus, undo/redo, browser routes, package tests, bun check, and focused Playwright suites. The catch: the gate command must already exist or be obvious.

Boundary

slate-ar-gate owns repeated execution, duration metrics, pass/fail logging, crashes, flakes, dashboard state, and ASI.
testing, tdd, editor-test-harvester, and slate-patch own missing oracle design.
slate-patch owns real correctness failures found by the gate.
slate-plan owns API/runtime redesign when the gate exposes a design issue.
slate-ar-perf owns speed optimization after the correctness gate is stable.

Do not spin a failing gate forever. If a gate fails twice with the same behavioral signal and the command shape is valid, route to slate-patch.

Setup

Default metric is elapsed seconds, lower is better. For boolean behavior gates, the metric tracks runtime and the log status carries truth:

pass/no change: measure;
pass after a useful change: keep;
assertion failure: checks_failed;
infra/runtime crash: crash;
slower or noisier proof with no value: discard.

Commands

For an explicit gate command:

bash

<autoresearch-cli> setup-plan --cwd .tmp/slate-v2 --name "<gate-name>" --metric-name "seconds" --benchmark-command "<gate command>" --benchmark-prints-metric false --checks-command "<gate command>"
<autoresearch-cli> doctor --cwd .tmp/slate-v2
<autoresearch-cli> serve --cwd .tmp/slate-v2
<autoresearch-cli> next --cwd .tmp/slate-v2
<autoresearch-cli> log --cwd .tmp/slate-v2 --from-last --status measure --description "<gate result>"

For full editor behavior proof, prefer one focused command first, then broaden:

bash

cd .tmp/slate-v2
bun check
bun check:full
PLAYWRIGHT_BASE_URL=http://localhost:3100 PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_WORKERS=1 bun playwright playwright/integration/examples/<suite>.test.ts --project=chromium

Record intentionally skipped behavior families when a full suite is too broad for the current proof.

Handoff

Report command, pass/fail status, packet counts, repeated failure signature, dashboard URL when served, and the next owner: continue gate, slate-patch, slate-plan, or slate-ar-perf.