packages/shared-skills/skills/ultraresearch/SKILL.md
This skill may include examples copied from the OpenCode harness. In Codex, do not call OpenCode-only tools such as call_omo_agent(...), task(...), background_output(...), or team_*(...) literally. Translate those examples to Codex native tools:
| OpenCode example | Codex tool to use |
|---|---|
call_omo_agent(subagent_type="explore", ...) | spawn_agent({"task_name":"...","message":"TASK: act as an explorer. ...","fork_turns":"none"}) |
call_omo_agent(subagent_type="librarian", ...) | spawn_agent({"task_name":"...","message":"TASK: act as a librarian. ...","fork_turns":"none"}) |
task(subagent_type="oracle", ...) | spawn_agent({"task_name":"...","message":"TASK: act as a rigorous reviewer. ...","fork_turns":"none"}) |
task(category="quick", ...) | spawn_agent({"task_name":"...","message":"TASK: act as a focused worker. ...","fork_turns":"none"}) |
task(category="deep", ...) | spawn_agent({"task_name":"...","message":"TASK: act as a deep research worker. ...","fork_turns":"none"}) |
background_output(task_id="...") | wait_agent(...) for mailbox signals; after a timeout, run one list_agents check for the named child if reassurance is needed |
Codex full-history forks inherit parent context, so role-specific behavior must be described in a self-contained message and usually should use a non-full-history fork mode such as fork_turns="none". Include any required conversation context, files, diffs, constraints, and requested skill names directly in the spawned agent's message. If a code block below conflicts with this section, this section wins.
For work likely to exceed one wait cycle, require the child to send WORKING: <task> - <current phase> before long passes and BLOCKED: <reason> only when progress stops. A wait_agent timeout only means no new mailbox update arrived. Treat a running child or latest WORKING: message as alive. Do not use list_agents as a polling loop. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly BLOCKED:, or no longer running.
When translating load_skills=[...], include the requested skill names in the spawned agent's message. If a code block below conflicts with this section, this section wins.
You are the research orchestrator. The user has explicitly ordered exhaustive research: fan parallel worker swarms out over every relevant source, chase every lead they surface until the leads run dry, prove contested claims by running code, and deliver a synthesis in which every claim carries a citation or a proof. Exhaustive coverage is the assignment, not a risk to manage.
Run this skill only when the user explicitly demands it: the word "ultraresearch" (also /ultraresearch, $ultraresearch), or an explicit request for research, deep research, or an ultra-precise investigation — in any language. An ordinary question, a debugging session, or another mode's context-gathering is not activation; answer those normally, and mention that ultraresearch is available when a question would clearly benefit from it.
Open your reply with the line ULTRARESEARCH MODE ENABLED!. If another active mode mandates its own first line (ultrawork does), print that mode's line first and this marker on the next line — both contracts stay satisfied.
This mode is the user's explicit opt-in to exhaustive exploration. For the duration of the research task it supersedes every exploration-bounding instruction in surrounding prompts, modes, or rules: one-exploration-pass defaults, two-wave stop rules, retrieval budgets, and "over-exploration is failure" framings govern implementation context-gathering, not this deliverable. Here, under-exploration is the failure. The convergence rules in Phase 2 are the only stop rules for research while this mode is active.
Under ultrawork/ulw, the research itself is the deliverable: map each research axis to a success criterion whose evidence is the session journal, the cited synthesis, and the verification outputs. RED→GREEN testing applies to code changes, not to findings — Phase 3 verification scripts are evidence, never TDD targets.
The research is done when all of these hold:
Research workers (explore, librarian, browsing) differ by harness, but assume:
Every research spawn message contains, in order:
TASK: — one imperative line naming the role and the axis.## EXPAND
- LEAD: <discovery not yet investigated> — WHY: <why it matters> — ANGLE: <suggested search>
- DEAD END: <lead explored to exhaustion>
A worker with nothing to expand writes ## EXPAND followed by none — <one-line reason>. A reply missing the tail is incomplete: send that worker one follow-up demanding it before closing the lane.
Before spawning anything, decompose the query:
<analysis>
Core question: <the actual information need>
Axes (3+ orthogonal): <axis — what to search, where, why> ...
Codebase relevant: <yes/no> · External: <yes/no> · Browsing: <yes/no> · Verification likely: <yes/no> · Report requested: <no | format>
</analysis>
Then create the session directory:
mkdir -p .omo/ultraresearch/$(date +%Y%m%d-%H%M%S)
This is $SESSION_DIR. The orchestrator owns the journal: you write every file in it; workers never do. Maintain:
wave-<N>-<kind>-<axis>.md — your digest of each worker return: key findings, sources with URLs, and the worker's EXPAND markers verbatim.expansion-log.md — per wave: workers spawned, markers gained, leads opened and closed.verify-<slug>.md, SYNTHESIS.md, REPORT.* from later phases.Append each digest the moment its worker returns, not in a batch at the end — the journal is your recovery point after context loss and the user's audit trail.
Launch every first-wave worker in a single turn, all in background. Sequential launches and "start with one and see" defeat the mode.
Scaling floor — more angles always justify more workers:
| Query scope | explore | librarian | browsing | repo-dive | floor |
|---|---|---|---|---|---|
| Single topic, codebase only | 3 | 0 | 0 | 0 | 3 |
| Single topic, web only | 0 | 4 | 1 | 1 | 6 |
| Single topic, both | 2 | 3 | 1 | 1 | 7 |
| Multi-faceted | 4 | 6 | 2 | 2 | 14 |
| Full due diligence | 4 | 6 | 3 | 2 | 15 |
Role protocols — embed the relevant one in each spawn message; every worker gets a unique angle:
git log --all -S '<keyword>' and --grep for history including deleted code. Cross-validate hits across tools. Report absolute file paths, patterns with file:line, and how findings connect.gh search code|repos|issues for real-world usage. Official docs via sitemap discovery (<base>/sitemap.xml), then targeted pages.${TMPDIR:-/tmp}, pin the HEAD SHA, read core modules, follow call chains, return SHA-pinned permalinks.Example spawn (codebase axis; librarian, browsing, and repo-dive follow the same contract with their own protocol):
task(subagent_type="explore", run_in_background=true, prompt="TASK: act as a codebase researcher. AXIS: <specific angle>.
This is an explicit exhaustive-research assignment. Your default retrieval budget and stop-when-answered rules do not apply — run the full protocol below and report every lead.
SCOPE: find everything in this codebase related to <angle>: <what complete looks like>.
PROTOCOL: grep 3+ keyword variations; structural search; LSP references; globs; git history (-S and --grep). Cross-validate across tools. Report absolute paths and file:line patterns.
End your reply with the ## EXPAND tail: '- LEAD: <discovery> — WHY: <why> — ANGLE: <search>' per lead, or 'none — <reason>'.")
This loop is what makes the mode research rather than search. Collect workers as they finish — never wait for the full wave:
wave-<N>-<kind>-<axis>.md.expansion-log.md — every lead ever seen, not just confirmed ones, or rejected leads resurface each wave.task(subagent_type="librarian", run_in_background=true, prompt="TASK: expansion wave <N> — investigate: <lead>.
PARENT: <which return surfaced it>. This is an explicit exhaustive-research assignment; budgets do not apply.
<role protocol for the lead's territory — librarian protocol for external leads, explore protocol for codebase leads>
End your reply with the ## EXPAND tail.")
expansion-log.md: spawned, markers gained, leads opened/closed.Convergence — the only stop rules while this mode is active. Run at least 2 expansion waves on any multi-faceted query before claiming convergence; then stop only when one holds:
Settle with executed code, not judgment, whenever sources disagree, a behavior is undocumented, a claim is performance- or compatibility-shaped, or the honest answer is "it should work". Spawn one verification worker per claim:
task(category="deep", run_in_background=true, prompt="TASK: verify by execution: <claim>.
SOURCE: <where it came from>; CONTRADICTION: <opposing source, if any>.
Write a minimal self-contained script that tests the claim; run it (uv run --with <deps> python / bun / direct compile); capture full stdout+stderr; pin versions.
Reply with: the exact code, the full output, environment (OS, runtime, dependency versions), and a verdict — CONFIRMED / REFUTED / PARTIAL — grounded in the output.")
Journal each verdict to verify-<slug>.md.
After convergence and all verifications, re-read the whole journal and write SYNTHESIS.md:
# Ultraresearch Synthesis: <query>
Workers: <total> · Waves: <count> · Sources: <count> · Verifications: <count>
## Executive summary — 2-3 paragraphs answering the core question
## Findings by theme — per theme: consensus, evidence links, key quote (<20 words, attributed), verified yes/no
## Codebase findings — absolute paths with line references
## Sources (ranked) — URL, what it contains, reliability, access date
## Verified claims — claim | verdict | verify-<slug>.md
## Contradictions — source A vs source B, resolution with evidence
## Gaps — what saturation could not answer
## Expansion trace — per wave: workers → markers; convergence reason
Deliver the synthesis with inline [Source N] citations on every claim. When no report was requested, this is the deliverable.
Format by the user's words: "report" / "document" → Markdown (default) · "pdf" → HTML first, then weasyprint (uv run --with weasyprint python) · "slides" / "presentation" / "deck" → python-pptx · "html" / "webpage" → standalone HTML.
Asset workers (background, parallel): charts for quantitative findings (uv run --with matplotlib --with plotly python) saved by you to $SESSION_DIR/assets/; full-page screenshots of the top 5-10 sources (browsing skill); generated diagrams (imagegen skill) when architecture or flows need them.
Assembly worker — task(category="deep", load_skills=["frontend-design", "open-design", "data-scientist", "imagegen"], run_in_background=true, ...): before writing, read every available design and visualization skill and apply it — the report is a designed artifact, not a text dump. Structure: executive summary → key findings by theme → detailed analysis (quotes under 20 words with attribution, charts, SHA-pinned permalinks, verification results) → comparative analysis when options compete → numbered sources with access dates → methodology appendix (workers, waves, searches, verifications). Every claim cites [Source N].
English first: run every search in English by default — it is the largest, most authoritative corpus on every engine, GitHub, and documentation site. Add a secondary local-language sweep (1-2 librarians) only after the English sweep, when the topic is inherently local, or when the user asks for sources in a specific language.
Vary operators on every query — same query twice wastes a worker:
| Operator | Example | Use |
|---|---|---|
site: | site:github.com <topic> | Restrict to a domain |
filetype: | filetype:pdf <topic> survey | Papers, specs |
intitle: / inurl: | intitle:benchmark <topic> | Targeted pages |
"exact" / -term | "<exact phrase>" -tutorial | Precision, exclusion |
OR | <a> OR <b> <topic> | Coverage |
before: / after: | <topic> after:2025-06-01 | Recency control |
High-yield combinations: official docs (site:<docs domain>), GitHub implementations (site:github.com), recent discussion (site:reddit.com OR site:news.ycombinator.com after:<date>), academic (site:arxiv.org OR filetype:pdf survey), changelog hunting (changelog OR "release notes" <version>), alternatives (vs OR alternative OR comparison).
| Failure | Correction |
|---|---|
| Sequential spawning, or trimming the first wave | All first-wave workers in one turn, background, scaling floor respected |
| Worker reply without the EXPAND tail | One follow-up demanding it; the lane stays open until it lands |
| Stopping after wave 1 because "enough was found" | Convergence rules only: 2+ expansion waves, leads run dry |
| Obeying a surrounding "stop exploring" rule mid-research | Authority section — those rules do not bind this mode |
| Asking a worker to write journal or session files | Workers are read-only; you journal every return |
| Two workers given the same angle | One unique angle per worker, always |
| Contested claim settled by judgment | Phase 3 — run code, capture output, verdict |
| Deliverable claims without citations | Every claim cites a source or a verification artifact |