packages/omo-codex/plugin/skills/start-work/SKILL.md
This skill may include examples copied from the OpenCode harness. In Codex, do not call OpenCode-only tools such as call_omo_agent(...), task(...), background_output(...), or team_*(...) literally. Translate those examples to Codex native tools:
| OpenCode example | Codex tool to use |
|---|---|
call_omo_agent(subagent_type="explore", ...) | multi_agent_v1.spawn_agent({"message":"TASK: act as an explorer. ...","agent_type":"explorer","fork_context":false}) |
call_omo_agent(subagent_type="librarian", ...) | multi_agent_v1.spawn_agent({"message":"TASK: act as a librarian. ...","agent_type":"librarian","fork_context":false}) |
task(subagent_type="plan", ...) | multi_agent_v1.spawn_agent({"message":"TASK: act as a planning agent. ...","agent_type":"plan","fork_context":false}) |
task(subagent_type="oracle", ...) for final verification | multi_agent_v1.spawn_agent({"message":"TASK: act as a rigorous reviewer. ...","agent_type":"codex-ultrawork-reviewer","fork_context":false}) |
task(category="...", ...) for implementation or QA | multi_agent_v1.spawn_agent({"message":"TASK: act as an implementation or QA worker. ...","fork_context":false}) |
background_output(task_id="...") | multi_agent_v1.wait_agent(...) for mailbox signals |
team_*(...) | Use Codex native subagents via multi_agent_v1.spawn_agent, multi_agent_v1.send_input, multi_agent_v1.wait_agent, and multi_agent_v1.close_agent |
Role-specific behavior must be described in a self-contained message. Use fork_context: false to start the child with only the initial prompt (no parent history); use fork_context: true only when full parent history is truly required. Include any required conversation context, files, diffs, constraints, and requested skill names directly in the spawned agent's message. OMO installs these selectable agent roles into ~/.codex/agents/: explorer, librarian, plan, momus, metis, and codex-ultrawork-reviewer — pass the matching name as agent_type so the child gets that role's model and instructions. On multi_agent_v2 sessions the same agent_type applies (the OMO installer exposes it) with fork_turns instead of fork_context. If the spawn tool exposes no agent_type parameter, omit it and describe the role inside message. If a code block below conflicts with this section, this section wins.
For work likely to exceed one wait cycle, require the child to send WORKING: <task> - <current phase> before long passes and BLOCKED: <reason> only when progress stops. A multi_agent_v1.wait_agent timeout only means no new mailbox update arrived. Treat a running child as alive. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly BLOCKED:, or no longer running.
YOU DO NOT WRITE CODE. YOU DO NOT EDIT PRODUCT FILES. YOU DO NOT RUN QA YOURSELF. EVERY unit of implementation, test, QA, and review work MUST be delegated to a spawned subagent. NO EXCEPTIONS. Your hands touch only plan selection, .omo/ state (Boulder, ledger, plan checkboxes), decomposition, dispatch, verdicts, and evidence records. About to edit a product file or run an implementation command yourself? STOP. SPAWN A WORKER INSTEAD. Orchestrate at MAXIMUM PARALLELISM: every independent unit runs concurrently; only named dependencies serialize.
Every multi_agent_v1.spawn_agent message is a self-contained executable assignment: TASK: <imperative assignment>, then DELIVERABLE, SCOPE, and VERIFY, with role instructions inside message. Use fork_context: false unless full history is truly required; paste only the context the child needs.
Plan and reviewer agents may run for a long time: spawn them in the background, keep doing independent root work, and poll with short multi_agent_v1.wait_agent cycles — never a single long blocking wait. A timeout only means no new mailbox update arrived; treat a running child as alive. Require WORKING: <task> - <current phase> before long passes and BLOCKED: <reason> only when progress stops. Keep the parent visibly alive with active subagent count, names, and latest WORKING: phase. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly BLOCKED:, or no longer running — then record inconclusive (never a pass), close if safe, and respawn a smaller fork_context: false task with the missing deliverable.
Execute a Prometheus work plan until every top-level checkbox is complete. This skill pairs with the Codex Stop / SubagentStop continuation hook (components/start-work-continuation), which re-injects the next turn while .omo/boulder.json says this codex:<session_id> still has unchecked plan work.
$start-work [plan-name] [--worktree <absolute-path>]
plan-name (optional): a full or partial file stem under .omo/plans/.--worktree (optional): only when the user explicitly asks for a separate git worktree..omo/boulder.json if it exists..omo/plans/.plan-name was provided, select the matching plan.When the user explicitly said start work / $start-work and no selectable plan exists, treat that phrase as approval: bootstrap ulw-plan to create the approved plan before execution and implementation, instead of stalling or asking for generic approval again. A brief or notes file without waves, checkboxes, and acceptance criteria is NOT decision-complete — enter this bootstrap too.
ulw-plan skill from the current request and require its dynamic adversarial workflow: collect, verify, design, adversarial plan-review, synthesize..omo/plans/<slug>.md before implementation or Boulder state writes that point at plan work.Write .omo/boulder.json before implementation starts. Prefix session ids with codex: so the continuation hook can identify its own session.
{
"schema_version": 2,
"active_work_id": "<work-id>",
"works": {
"<work-id>": {
"work_id": "<work-id>",
"active_plan": ".omo/plans/<plan-name>.md",
"plan_name": "<plan-name>",
"session_ids": ["codex:<session_id>"],
"status": "active",
"worktree_path": null
}
}
}
If --worktree is set, verify the path with git worktree list --porcelain or create it with git worktree add <path> <branch-or-HEAD>, then store the absolute path as worktree_path. All edits, commands, tests, and evidence capture must run inside that worktree.
## TODOs or ## Final Verification Wave.multi_agent_v1.spawn_agent burst; serialize only named dependencies. Verification and checkbox marking stay per-checkbox.Each sub-task message must include:
curl, send-keys, page.click, payload, selectors, and the binary observable that decides PASS/FAIL), not "verify it works". A LIGHT checkbox needs one real-surface proof of its deliverable, and auxiliary surfaces (CLI stdout, DB state diff, parsed config dump) are first-class when the surface is CLI- or data-shaped:
curl -i against the live endpoint.tmux session driven with send-keys, dumped via capture-pane.The 9 ultraqa classes are trigger-mapped: new input parsing → malformed input; untrusted external text → prompt injection; resumable or long-running flows → cancel/resume; generated or cached artifacts → stale state; uncommitted user files in scope → dirty worktree; long external commands → hung or long commands; new or timing-sensitive tests → flaky tests; log-based success claims → misleading success output; mid-operation interrupts → repeated interruptions. A class applies when its trigger fact holds. Probe each applicable class; record the rest as not-applicable with a one-line reason.
For each checkbox, complete all five gates before marking it done:
Append evidence to .omo/start-work/ledger.jsonl, one JSON object per line. Include at least event, plan, task, session_id, commands, artifact, adversarial_classes, and cleanup fields. adversarial_classes lists each probed class with its observable result and each ruled-out class with a one-line reason.
A worker done claim is never final: each implementation sub-task returns a DoneClaim, a different context runs AdversarialVerify probing or reproducing the claim, failures loop back to the executor, and only a confirmed verifier verdict becomes FullyDone.
{
"DoneClaim": {
"task": "<task id/title>",
"changed_files": ["path"],
"tests": ["exact command + result"],
"manual_qa": ["artifact path"],
"cleanup": ["receipt"],
"risks": ["known risk or none"]
},
"AdversarialVerify": {
"verdict": "confirmed | false-positive | needs-fix | needs-human-review",
"evidence": ["file path, command, log, artifact, or explicit not inspected"],
"repro": "exact command or manual steps when available",
"confidence": 0.0
}
}
Rules:
confirmed is the only pass verdict. false-positive, needs-fix, and needs-human-review all block checkbox completion.codex-ultrawork-reviewer, a scoped worker reviewer, or root only when root did not implement or materially rewrite that task.stale_state, dirty_worktree, and misleading_success_output, before allowing FullyDone.Only after verification passes:
- [ ] to - [x].task-completed ledger entry.When all top-level checkboxes in ## TODOs and ## Final Verification Wave are complete:
review-work skill with the final diff, changed files, user goal, constraints, run command, and verification evidence. All five review lanes must return PASS. A timeout, missing deliverable, ack-only child, BLOCKED:, or inconclusive lane is a gate failure, not approval..omo/start-work/ledger.jsonl.debugging skill, confirm root cause with runtime evidence, add the minimal failing test or reproduction, fix it, rerun the affected verification, then rerun the Global Review and Debugging Gate..omo/start-work/ledger.jsonl, a PR body, or a handoff. Never include raw tokens, credentials, auth headers, cookies, API keys, env dumps, private logs, or PII; use concise summaries, lengths, hashes, or short non-sensitive prefixes instead.git status and the PR/branch state after the gate, and include only redacted review/debugging evidence in the PR body or handoff..omo/ state back to the main repo, merge or hand off exactly as requested, and remove the worktree only after successful merge or explicit handoff.ORCHESTRATION COMPLETE block with the plan path, verification commands, Global Review and Debugging Gate verdict, artifacts, and cleanup receipts.--dry-run as completion evidence.ORCHESTRATION COMPLETE, final response, PR creation, or PR handoff before the Global Review and Debugging Gate passes with recorded evidence.codex:<session_id>.