Back to Oh My Openagent

start-work

packages/shared-skills/skills/start-work/SKILL.md

4.5.127.9 KB
Original Source

Codex Harness Tool Compatibility

This skill ports the OpenCode /start-work flow onto Codex. Any OpenCode-only tool name in an inherited example must be translated to its Codex equivalent:

OpenCode exampleCodex tool to use
task(subagent_type="explore", ...)spawn_agent(agent_type="explorer", task_name="...", message="...")
task(subagent_type="librarian", ...)spawn_agent(agent_type="librarian", task_name="...", message="...")
task(subagent_type="plan", ...)spawn_agent(agent_type="plan", task_name="...", message="...")
task(subagent_type="oracle", ...) for final verificationspawn_agent(agent_type="codex-ultrawork-reviewer", task_name="...", message="...")
task(category="...", ...) for implementation or QAspawn_agent(agent_type="worker", task_name="...", message="...")
background_output(task_id="...")wait_agent(...)
dispatchInternalPrompt(...)the Stop hook emits {"decision":"block","reason":"<prompt>"} automatically; see Continuation
team_*(...)spawn_agent + send_message + followup_task + wait_agent + close_agent

When translating load_skills=[...], name the skills inside the spawned agent's message. If a code block below conflicts with this section, this section wins.

start-work

Execute a Prometheus work plan until every top-level checkbox is complete. This skill pairs with the Codex Stop / SubagentStop continuation hook in components/start-work-continuation, which re-injects the next turn while .omo/boulder.json says the current codex:<session_id> still has unchecked plan work.

Usage

text
$start-work [plan-name] [--worktree <absolute-path>]
  • plan-name is optional. It may be a full or partial file stem under .omo/plans/.
  • --worktree is optional. Use it only when the user explicitly asks to work in a separate git worktree.

Phase 1: Select the plan

  1. Read .omo/boulder.json if it exists.
  2. List Prometheus plan files under .omo/plans/.
  3. If plan-name was provided, select the matching plan.
  4. If exactly one active or paused Boulder work exists for this session, resume it.
  5. If no active work exists and exactly one plan exists, select it.
  6. If multiple plans remain possible, ask one focused selection question.

Phase 2: Create or update Boulder state

Write .omo/boulder.json before implementation starts. Session ids must be prefixed with codex: so the continuation hook can identify its own session.

json
{
  "schema_version": 2,
  "active_work_id": "<work-id>",
  "works": {
    "<work-id>": {
      "work_id": "<work-id>",
      "active_plan": ".omo/plans/<plan-name>.md",
      "plan_name": "<plan-name>",
      "session_ids": ["codex:<session_id>"],
      "status": "active",
      "worktree_path": null
    }
  }
}

If --worktree is set, verify the path with git worktree list --porcelain or create it with git worktree add <path> <branch-or-HEAD>, then store the absolute path as worktree_path. All edits, commands, tests, and evidence capture must run inside that worktree.

Phase 3: Execute the next checkbox

  1. Read the full selected plan.
  2. Find the first unchecked column-0 checkbox in ## TODOs or ## Final Verification Wave.
  3. Ignore nested checkboxes under acceptance criteria, evidence, and definition-of-done sections.
  4. Decompose that checkbox into atomic sub-tasks.
  5. Dispatch independent sub-tasks in parallel with spawn_agent; serialize only when one sub-task has a named dependency on another.

Each sub-task message must include:

  1. Goal and exact files or directories in scope.
  2. When the task touches existing behavior: a baseline characterization test, written first, that asserts current observable behavior and passes on the unchanged code. Then the red test or failing reproduction for the new behavior before production changes. Pin the baseline as rigorously as the new test: exact inputs, exact observable, exact assertion.
  3. Implementation constraints from the plan and project rules.
  4. Automated verification commands to run.
  5. One Manual-QA channel, named with the exact tool and exact invocation (the literal curl, send-keys, page.click, payload, selectors, and the binary observable that decides PASS/FAIL), not "verify it works":
    • HTTP call: curl -i against the live endpoint.
    • tmux: a tmux session driven with send-keys, dumped via capture-pane.
    • Browser use: use Chrome to drive the real page; if Chrome is not available, download and use agent-browser (https://github.com/vercel-labs/agent-browser).
    • Computer use: OS-level GUI automation against the running desktop app when the surface is not a page.
  6. The adversarial classes that apply to this sub-task (from the 9 ultraqa classes) and how each is probed.
  7. Required artifact path and cleanup receipt.

Apply ultraqa's 9 adversarial classes where relevant to each checkbox: malformed input, prompt injection, cancel/resume, stale state, dirty worktree, hung or long commands, flaky tests, misleading success output, repeated interruptions. A checkbox whose behavior is user-visible MUST probe every class that plausibly applies; record which classes were exercised and which were ruled not-applicable with a one-line reason.

Phase 4: Verify and record evidence

For each checkbox, complete all five gates before marking it done:

  1. Plan reread: confirm the checkbox and acceptance criteria.
  2. Automated verification: run tests, typecheck, lint, build, or the plan-specific equivalent.
  3. Manual-QA channel: capture a real artifact, not a dry-run claim.
  4. Adversarial QA: exercise every applicable ultraqa class (malformed input, prompt injection, cancel/resume, stale state, dirty worktree, hung or long commands, flaky tests, misleading success output, repeated interruptions) and capture the observable result for each. "Tests pass" and a clean happy-path artifact are NOT sufficient when an adversarial class applies and was not probed.
  5. Cleanup: register every QA resource teardown as its own todo the moment it is spawned (QA scripts, tmux assets, browser / agent-browser sessions, PIDs, ports, containers, temp dirs), then execute each and capture the receipt. No QA asset is left running.

Append evidence to .omo/start-work/ledger.jsonl using one JSON object per line. Include at least event, plan, task, session_id, commands, artifact, adversarial_classes, and cleanup fields. adversarial_classes lists each probed class with its observable result and each ruled-out class with a one-line reason.

Phase 5: Mark progress

Only after verification passes:

  1. Edit the plan checkbox from - [ ] to - [x].
  2. Re-read the plan and confirm the remaining count decreased.
  3. Append a task-completed ledger entry.
  4. Continue with the next checkbox. Do not ask whether to continue.

Completion

When all top-level checkboxes in ## TODOs and ## Final Verification Wave are complete:

  1. Run the plan's final verification commands.
  2. If worktree mode was used, sync .omo/ state back to the main repo, merge or hand off exactly as requested, and remove the worktree only after successful merge or explicit handoff.
  3. Remove or mark the Boulder work as completed.
  4. Print an ORCHESTRATION COMPLETE block with the plan path, verification commands, artifacts, and cleanup receipts.

Hard rules

  • No production change before a failing test or reproduction exists, and no change to existing behavior before a baseline characterization test pins the current behavior and passes on the unchanged code.
  • No --dry-run as completion evidence.
  • No tests-only completion claim. A Manual-QA artifact is required.
  • No completion claim while an applicable ultraqa adversarial class was never probed. Each applicable class needs a captured observable result; each skipped class needs a one-line not-applicable reason in the ledger.
  • No unprefixed session ids in Boulder state. Codex sessions are always codex:<session_id>.
  • No stale-memory execution. The plan and ledger are the durable source of truth.