packages/omo-codex/plugin/skills/remove-ai-slops/SKILL.md
This skill may include examples copied from the OpenCode harness. In Codex, do not call OpenCode-only tools such as call_omo_agent(...), task(...), background_output(...), or team_*(...) literally. Translate those examples to Codex native tools:
| OpenCode example | Codex tool to use |
|---|---|
call_omo_agent(subagent_type="explore", ...) | spawn_agent(agent_type="explorer", task_name="...", message="...") |
call_omo_agent(subagent_type="librarian", ...) | spawn_agent(agent_type="librarian", task_name="...", message="...") |
task(subagent_type="plan", ...) | spawn_agent(agent_type="plan", task_name="...", message="...") |
task(subagent_type="oracle", ...) for final verification | spawn_agent(agent_type="codex-ultrawork-reviewer", task_name="...", message="...") |
task(category="...", ...) for implementation or QA | spawn_agent(agent_type="worker", task_name="...", message="...") |
background_output(task_id="...") | wait_agent(...) to wait for subagent completion and mailbox updates |
team_*(...) | Use Codex native subagents plus send_message, followup_task, wait_agent, and close_agent |
When translating load_skills=[...], include the requested skill names in the spawned agent's message. If a code block below conflicts with this section, this section wins.
merge-base main (no arguments needed)Cleans AI-generated slop from a bounded set of changed files while strictly preserving behavior. Locks behavior with regression tests first, then runs a categorized multi-pass cleanup, then verifies with quality gates and a critical review. Reverts and direct-edits when verification fails.
The core safety invariant: behavior is locked by green tests before a single line is removed. A checklist alone is not safety; a passing regression test is.
The agent looks for these nine categories. The first three are stylistic, the next three are structural, the next two are about hidden cost, and the last is about behavior coverage.
Obvious comments — comments restating code, trivial docstrings, section dividers, commented-out code, vague TODOs/Notes.
# given, # when, # then, # when/then).Over-defensive code — null checks for guaranteed values, try/except around code that cannot raise, isinstance checks for statically typed params, default values for required params, backward-compat shims, redundant validation duplicated at multiple layers, broad exception catching (except Exception/except BaseException in Python, empty catch {} or catch (e) { console.error(e) } without narrowing in TypeScript/JavaScript).
main(), HTTP handler) with explicit logging + re-raise is acceptable.except Exception → catch the specific exception you expect. Empty catch {} → add instanceof narrowing or re-throw. catch (e) { log(e) } → narrow with instanceof, handle known cases, re-throw unknown.Excessive complexity — deep nesting (>3 levels), nested ternaries, complex boolean expressions (combine 4+ predicates), long parameter lists (>5 args without a struct/dataclass/object), god functions (>50 lines doing many things), overly clever one-liners that sacrifice readability, if/elif/else chains for type/enum/literal discrimination (must be match/case + assert_never), object used as a type annotation (must be Protocol, TypeVar, or explicit union).
if/else for boolean conditions and range checks (not variant discrimination).match/case with assert_never on the wildcard. object annotations → Protocol (structural), TypeVar (generic), or union (known variants).Needless abstraction — pass-through wrappers, single-use helpers, speculative indirection ("we might need this later"), interfaces with one implementer where the interface adds no testability win, factory functions that just call a constructor.
Boundary violations — wrong-layer imports (UI importing DB driver), leaky responsibilities (handler doing business logic that belongs in a service), hidden coupling (module A reads module B's private state), side effects in pure-named functions.
Dead code — unused imports, unused private functions/methods, unreachable branches, stale feature flags, debug leftovers (console.log, print(...), dbg!), removed-but-still-referenced code.
Duplication — copy-pasted branches with trivial differences, redundant helpers that do the same thing in two places, repeated literal/magic-number sequences.
Performance equivalences (behavior-preserving optimizations) — changes that are provably equivalent in semantics but cheaper in time/space:
list(...) when only iterated once → generator)join.length / len() recomputed inside loop → cacheHard rule: only apply when behavior equivalence is obvious. Do NOT change algorithms with subtle correctness implications. Do NOT micro-optimize hot paths without a benchmark. If in doubt, SKIP.
awk '!/^[[:space:]]*$/ && !/^[[:space:]]*(#|\/\/)/' <file> | wc -l.When found, do NOT just flag it. Execute a full modular refactoring:
check-no-excuse-rules.py recursively on scope to list all violations.utils.py, helpers.py, common.py, part_1.py).__init__.py re-exports (re-exports ONLY, no logic in __init__.py).check-no-excuse-rules.py again — every file must be ≤250 pure LOC. Run tests, typecheck, lint.Forbidden escapes:
foo_1.py, foo_2.py) — split by what each file DOES.utils.py, helpers.py, service.py).KEEP: genuinely self-contained single-responsibility scripts (e.g., a standalone CLI checker). Opt out with # noqa: SIZE_OK in first 5 lines and a comment explaining why.
A pass is complete only when all applicable gates are green. Skip gates that are genuinely N/A for the project (e.g., no security scanner configured), and report N/A explicitly — do not silently skip.
| Gate | Tool | Pass condition |
|---|---|---|
| Regression tests | project's test runner | all green |
| Lint | project's linter | zero errors (warnings OK if pre-existing) |
| Typecheck | lsp_diagnostics on changed files + project type-checker | zero new errors |
| Unit/integration tests | project's test runner | all green (pre-existing failures noted, not introduced) |
| Static/security scan | project's scanner | zero new findings, or N/A if not configured |
Create todos for all phases below. Mark in_progress one at a time.
If file paths were passed as arguments, that is the scope. Otherwise:
git diff $(git merge-base main HEAD)..HEAD --name-only
Filter out: deleted files, binary files, generated/vendored files (node_modules/, dist/, target/, lockfiles). List the final scope.
For each in-scope source file:
git grep / project test conventions to find related test files.If you cannot establish a green baseline (e.g., test runner is broken), STOP and report. Do not proceed with cleanup on unverified ground.
Produce an explicit plan before spawning the removal agents:
File: src/foo.py
Categories: dead code, excessive complexity, performance
Order: dead code → complexity → performance
Risk: medium (touches caching layer)
File: src/bar.py
Categories: obvious comments, over-defensive
Order: comments → defensive
Risk: low
Order rule (safest → riskiest): comments → dead code → defensive → duplication → complexity → abstraction/boundary → performance → tests → oversized-modules. This minimizes blast radius of any one change.
deep agents in batches of 5Files are processed by deep category agents with the $omo:remove-ai-slops skill loaded, batched 5 at a time in parallel. The executable skill name is remove-ai-slops. The deep category gives the agent enough thoroughness to correctly evaluate the 9 categories and respect the KEEP rules without slipping into surface fixes; the 5-wide batch is the sweet spot — more than 5 creates result-merging noise and context contention, fewer wastes parallelism.
Batching protocol (strict):
task calls in a single message, every one with run_in_background=true.background_output(task_id=...).Never launch all files at once when there are more than 5; never launch them serially when more than one remains in the current batch.
Per-file invocation (one of the 5 in a batch):
task(
category="deep",
load_skills=["remove-ai-slops"],
run_in_background=true,
description="Slop removal: {filename}",
prompt="""
Remove AI slops from: {file_path}
In addition to your default categories (obvious comments, over-defensive code, spaghetti nesting), also evaluate these categories:
- Excessive complexity: god functions, long parameter lists, complex booleans, nested ternaries
- Needless abstraction: pass-through wrappers, single-use helpers, speculative indirection
- Boundary violations: wrong-layer imports, leaky responsibilities, hidden coupling
- Dead code: unused imports, unreachable branches, stale flags, debug leftovers
- Duplication: copy-paste branches, redundant helpers
- Performance equivalences: O(n²)→O(n) via set lookup, hoist computation out of loops, eager→lazy collections, batch redundant calls, cache repeated len()/length
Apply changes in this order (safest → riskiest): comments → dead code → defensive → duplication → complexity → abstraction/boundary → performance → oversized-modules.
Hard constraints:
- Behavior MUST be preserved. When equivalence is not obvious, SKIP.
- Do NOT change public API signatures.
- Do NOT remove type hints.
- Do NOT introduce new abstractions or dependencies.
- Diff stays minimal and scoped to slop removal.
Report changes grouped by category. For each change, give before/after, why-slop, why-safe.
For each skipped issue, give reason.
"""
)
Batch failure handling: if a deep agent in a batch fails or times out, do NOT block the remaining 4 in that batch. Collect the successful results, mark the failed file for retry in a later batch (single retry max), and continue. If retry also fails, escalate that file under "Issues Found & Fixed" in the final report.
Run the five quality gates listed above. Then walk the critical review checklist:
Safety:
Behavior:
Quality:
If any gate fails or any checklist item flips:
git checkout the affected file (or use git diff + targeted Edit to revert just the problematic hunk).If you fail three times on the same file, STOP and escalate to the user with: the file, what you tried, what failed, your hypothesis. Do not keep editing.
AI SLOP REMOVAL REPORT
======================
Scope: [branch diff vs merge-base main / explicit file list]
Files: [N files]
- path/to/file1.ts
- path/to/file2.py
Behavior Lock:
- Existing coverage: [N files already covered]
- Tests added: [M new regression tests at path/to/test_X.py]
- Baseline status: GREEN
Cleanup Plan:
- path/to/file1.ts: [dead code → complexity → performance]
- path/to/file2.py: [comments → defensive]
Per-File Results:
path/to/file1.ts
- Dead code: 3 removed (lines X-Y, A-B, C)
- Excessive complexity: 1 simplified (nested ternary at L42 → if/else)
- Performance: 1 (line N: list scan → set lookup, O(n²)→O(n), behavior identical)
- Skipped (preserved): 2 (defensive null check at boundary; commented WHY at L88)
path/to/file2.py
- Obvious comments: 5 removed
- Over-defensive: 1 simplified (redundant isinstance on typed param)
Quality Gates:
- Regression tests: PASS (12 tests, 0 failed)
- Lint: PASS
- Typecheck (lsp_diagnostics + project): PASS (0 new errors on changed files)
- Unit/integration tests: PASS (45 tests, 0 failed)
- Static/security scan: N/A (not configured)
Critical Review:
- Safety: PASS
- Behavior: PASS
- Quality: PASS
Issues Found & Fixed:
- [None] OR [Issue description → Fix applied]
Remaining Risks / Deferred:
- [None] OR [e.g., "boundary violation in module X flagged but not refactored — needs human judgment"]
Final Status: CLEAN | ISSUES FIXED | REQUIRES ATTENTION
N/A and why. If a check failed and you could not fix it, say so. Never claim PASS without evidence.lsp_diagnostics, the test runner, and direct file reads until the result is grounded.