packages/builtin-skills/src/verify/SKILL.md
You are the builder for a task. A separate review step will judge your
delivery against a verify plan — a list of criteria, some of which demand
evidence (a screenshot, a DOM snapshot, CLI output…). A criterion that
declares requiredEvidence cannot pass on your text alone: if the artifact
is missing, the structural gate marks it uncertain and the delivery is held.
So while you do the work, capture the proof and submit it. The loop:
discover plan → pick the surface → capture evidence per criterion → submit each → self-check coverage
Everything here is portable: the hard dependencies are the lh CLI (already
authed in your environment) and, for UI proof, agent-browser. No repo scripts,
no local report directory.
lh is authed. Confirm with lh verify run list --json (an empty []
means authed; an auth error means stop and surface it).$LOBE_OPERATION_ID in the
environment (or named in your task prompt). Every command below keys off it.
If it is unset, there is no plan to satisfy — skip this skill.agent-browser is installed. npm i -g agent-browser
then agent-browser install (downloads Chrome). Full reference:
references/agent-browser.md.One read tells you what to prove:
lh verify plan state "$LOBE_OPERATION_ID" --json
Each verifyPlan[] item carries id (the checkItemId), title, required,
and verifierConfig.requiredEvidence ([{ type, hint }] — the artifacts you MUST
capture). The checkItemId is the only handle you need: lh verify submit (Step 3)
keys off it plus your operation id and creates the result row for you, so you do
not need a checkResultId up front. (Result rows generally don't exist yet at
this point — that's expected.) Exact shapes:
references/plan-format.md.
Only items with a non-empty
requiredEvidenceneed an artifact. Items without it are judged on the deliverable text alone — don't fabricate evidence.
The criterion's hint usually implies the surface. Match the change you made to
the cheapest surface that can actually prove it, and escalate only if needed:
| What your task changed | Surface | Why | Guide |
|---|---|---|---|
| Backend / CLI / library / data logic | CLI 端 | Fastest, text-assertable, zero UI flakiness — upload stdout as text | surfaces/cli.md |
| Web app frontend / styles / interactions | Web 端 (agent-browser → running web app) | The product shape users see; screenshot/DOM the rendered result | surfaces/web.md |
| New/changed API plus the UI consuming it | Web 端,full-stack (agent-browser + network capture) | One surface where request/response and rendered result are both observable | surfaces/web.md |
| Desktop (Electron) app behavior | Electron 端 (agent-browser --cdp) | Only the real desktop shell exercises desktop-only code paths | surfaces/electron.md |
| Native macOS app / OS-level behavior agent-browser can't reach | Native 端 (Computer Use: osascript + screencapture) | The only way to drive non-Chromium apps and OS chrome (local macOS only) | surfaces/native.md |
Rules of thumb:
text — it's the strongest,
cheapest proof.Capture each required type (recipes per surface in
references/evidence.md), then submit one artifact per
call with the criterion's checkItemId. lh verify submit resolves your session
from the operation id, lazily creates/updates the result row, and attaches the
evidence — one call, no checkResultId needed:
# CHECK_ITEM_ID is the plan item id for this criterion (from Step 1).
# file artifact (screenshot / dom / video)
lh verify submit --operation "$LOBE_OPERATION_ID" --item "$CHECK_ITEM_ID" \
--type screenshot --file ./proof/login.png --by agent-browser \
--desc "Logged-in home renders the workspace switcher"
# inline text artifact (stdout / computed value) — no file
lh verify submit --operation "$LOBE_OPERATION_ID" --item "$CHECK_ITEM_ID" \
--type text --content "$(your-cli command --json)" --by cli \
--desc "command reports success after the change"
--by records provenance: agent-browser | cdp | cli | program. Use
--file for binaries, --content for text — exactly one. Submit one artifact per
call; call again for each additional one (same --item reuses the row). Leave the
pass/fail verdict to the review step — only add --verdict if your task
explicitly asks you to self-assert the outcome.
Before you declare the task done, prove every required artifact landed. For each
criterion with requiredEvidence, list what you submitted and confirm each type
is present. After submitting, the result rows exist, so map each checkItemId to
its checkResultId and list that row's evidence:
lh verify result list --operation "$LOBE_OPERATION_ID" --json # checkItemId → checkResultId
lh verify evidence list "$CHECK_RESULT_ID" --json
Coverage rule: for each required criterion, every requiredEvidence[].type
must appear at least once in its evidence list. Report it explicitly, e.g.
coverage: 2/2 criteria, all required evidence uploaded. If a type is missing, go
back to Step 3 — a missing artifact holds the delivery at uncertain no matter
how good the work is.
Plan item: "Settings page shows the new 'Beta features' toggle",
requiredEvidence: [{ type: screenshot }], required: true.
OP="$LOBE_OPERATION_ID"
# 1. discover: find this item's checkItemId + required evidence
lh verify plan state "$OP" --json # → item id vci_settings, requires screenshot
# 2. 端 = web (frontend change). Auth the session if needed (see references/auth.md).
agent-browser --session app open "http://localhost:3000/settings"
agent-browser --session app wait --text "Beta features"
agent-browser --session app screenshot ./proof/settings-beta.png
# 3. submit: creates the result row + attaches the screenshot in one call
lh verify submit --operation "$OP" --item vci_settings --type screenshot \
--file ./proof/settings-beta.png --by agent-browser \
--desc "Settings page renders the new Beta features toggle"
# 4. self-check
lh verify result list --operation "$OP" --json # → { checkItemId: vci_settings, id: vcr_77 }
lh verify evidence list vcr_77 --json # → one screenshot present → 1/1 covered
agent-browser screenshot /
dom / eval render from the browser engine and run headless; screencapture
/ osascript are macOS-only and break in the cloud.verify/
├── SKILL.md # this router: the loop + 端 decision + example
├── surfaces/ # 不同端的验收方案 — pick one per criterion
│ ├── cli.md # backend / CLI 端 → text evidence from command output
│ ├── web.md # web 端 (frontend / full-stack) via agent-browser
│ ├── electron.md # desktop 端 (Electron) via agent-browser --cdp
│ └── native.md # native macOS app / OS-level 端 via Computer Use
└── references/ # 跨端共享知识
├── plan-format.md # verify contract: plan shape + checkItemId↔checkResultId join
├── evidence.md # evidence type → capture recipe; upload; coverage; portability
├── agent-browser.md # full agent-browser CLI reference (any Chromium app)
├── computer-use.md # macOS Computer Use toolkit (osascript + screencapture)
├── recording.md # GIF / MP4 recording for time-based evidence
└── auth.md # portable auth: session/state/vault/cookie injection + boundaries