packages/shared-skills/skills/remove-ai-slops/SKILL.md
merge-base main (no arguments needed)Cleans AI-generated slop from a bounded set of changed files while strictly preserving behavior. Locks behavior with regression tests first, then runs a categorized multi-pass cleanup, then verifies with quality gates and a critical review. Reverts and direct-edits when verification fails.
The core safety invariant: behavior is locked by green tests before a single line is removed. A checklist alone is not safety; a passing regression test is.
The agent looks for these nine categories. The first three are stylistic, the next three are structural, the next two are about hidden cost, and the last is about behavior coverage.
Obvious comments — comments restating code, trivial docstrings, section dividers, commented-out code, vague TODOs/Notes.
# given, # when, # then, # when/then).Over-defensive code — null checks for guaranteed values, try/except around code that cannot raise, isinstance checks for statically typed params, default values for required params, backward-compat shims, redundant validation duplicated at multiple layers, broad exception catching (except Exception/except BaseException in Python, empty catch {} or catch (e) { console.error(e) } without narrowing in TypeScript/JavaScript).
main(), HTTP handler) with explicit logging + re-raise is acceptable.except Exception → catch the specific exception you expect. Empty catch {} → add instanceof narrowing or re-throw. catch (e) { log(e) } → narrow with instanceof, handle known cases, re-throw unknown.Excessive complexity — deep nesting (>3 levels), nested ternaries, complex boolean expressions (combine 4+ predicates), long parameter lists (>5 args without a struct/dataclass/object), god functions (>50 lines doing many things), overly clever one-liners that sacrifice readability, if/elif/else chains for type/enum/literal discrimination (must be match/case + assert_never), object used as a type annotation (must be Protocol, TypeVar, or explicit union).
if/else for boolean conditions and range checks (not variant discrimination).match/case with assert_never on the wildcard. object annotations → Protocol (structural), TypeVar (generic), or union (known variants).Needless abstraction — pass-through wrappers, single-use helpers, speculative indirection ("we might need this later"), interfaces with one implementer where the interface adds no testability win, factory functions that just call a constructor.
Boundary violations — wrong-layer imports (UI importing DB driver), leaky responsibilities (handler doing business logic that belongs in a service), hidden coupling (module A reads module B's private state), side effects in pure-named functions.
Dead code — unused imports, unused private functions/methods, unreachable branches, stale feature flags, debug leftovers (console.log, print(...), dbg!), removed-but-still-referenced code.
Duplication — copy-pasted branches with trivial differences, redundant helpers that do the same thing in two places, repeated literal/magic-number sequences.
Performance equivalences (behavior-preserving optimizations) — changes that are provably equivalent in semantics but cheaper in time/space:
list(...) when only iterated once → generator)join.length / len() recomputed inside loop → cacheHard rule: only apply when behavior equivalence is obvious. Do NOT change algorithms with subtle correctness implications. Do NOT micro-optimize hot paths without a benchmark. If in doubt, SKIP.
awk '!/^[[:space:]]*$/ && !/^[[:space:]]*(#|\/\/)/' <file> | wc -l.When found, do NOT just flag it. Execute a full modular refactoring:
check-no-excuse-rules.py recursively on scope to list all violations.utils.py, helpers.py, common.py, part_1.py).__init__.py re-exports (re-exports ONLY, no logic in __init__.py).check-no-excuse-rules.py again — every file must be ≤250 pure LOC. Run tests, typecheck, lint.Forbidden escapes:
foo_1.py, foo_2.py) — split by what each file DOES.utils.py, helpers.py, service.py).KEEP: genuinely self-contained single-responsibility scripts (e.g., a standalone CLI checker). Opt out with # noqa: SIZE_OK in first 5 lines and a comment explaining why.
A pass is complete only when all applicable gates are green. Skip gates that are genuinely N/A for the project (e.g., no security scanner configured), and report N/A explicitly — do not silently skip.
| Gate | Tool | Pass condition |
|---|---|---|
| Regression tests | project's test runner | all green |
| Lint | project's linter | zero errors (warnings OK if pre-existing) |
| Typecheck | lsp_diagnostics on changed files + project type-checker | zero new errors |
| Unit/integration tests | project's test runner | all green (pre-existing failures noted, not introduced) |
| Static/security scan | project's scanner | zero new findings, or N/A if not configured |
Create todos for all phases below. Mark in_progress one at a time.
If file paths were passed as arguments, that is the scope. Otherwise:
git diff $(git merge-base main HEAD)..HEAD --name-only
Filter out: deleted files, binary files, generated/vendored files (node_modules/, dist/, target/, lockfiles). List the final scope.
For each in-scope source file:
git grep / project test conventions to find related test files.If you cannot establish a green baseline (e.g., test runner is broken), STOP and report. Do not proceed with cleanup on unverified ground.
The largest, safest deletion is code that should not have existed. Before categorizing smells, run the deletion ladder on each changed unit:
<input type="date">, a custom query parser → URLSearchParams, a bespoke debounce → the util already imported).Only code that lands on Simplify in place proceeds to the smell categories. This turns the pass from "find smells to trim" into "first decide whether the code should exist, then trim what survives." One function replaced by a platform call is a bigger, safer win than any in-place cleanup — and it needs no per-line smell analysis.
For a diff that fixes a bug, grep the callers of every shared function it touches. Prefer one root-cause fix at the shared seam over repeated guards at each caller — a per-caller patch that leaves a sibling caller broken is a partial fix, not a cleanup.
Then produce an explicit plan before spawning the removal agents:
File: src/foo.py
Ladder: 2 units simplify-in-place; 1 unit delete (native <input> replaces custom picker)
Categories: dead code, excessive complexity, performance
Order: dead code → complexity → performance
Risk: medium (touches caching layer)
File: src/bar.py
Ladder: all simplify-in-place
Categories: obvious comments, over-defensive
Order: comments → defensive
Risk: low
Intentional shortcuts: if the plan deliberately keeps a bounded simplification (a naive scan fine under N rows, a global lock, an O(n²) path), mark it in-code with a debt: comment naming the ceiling and the upgrade trigger (in omo, prefix with // @allow so the comment-checker treats it as intentional), and list it under "Remaining Risks / Deferred" in the report. That section is the debt ledger — a simplification with a known ceiling and no marker is indistinguishable from a bug.
Order rule (safest → riskiest): comments → dead code → defensive → duplication → complexity → abstraction/boundary → performance → tests → oversized-modules. This minimizes blast radius of any one change.
deep agents in batches of 5Files are processed by deep category agents with the $omo:remove-ai-slops skill loaded, batched 5 at a time in parallel. The executable skill name is remove-ai-slops. The deep category gives the agent enough thoroughness to correctly evaluate the 9 categories and respect the KEEP rules without slipping into surface fixes; the 5-wide batch is the sweet spot — more than 5 creates result-merging noise and context contention, fewer wastes parallelism.
Batching protocol (strict):
task calls in a single message, every one with run_in_background=true.background_output(task_id=...).Never launch all files at once when there are more than 5; never launch them serially when more than one remains in the current batch.
Per-file invocation (one of the 5 in a batch):
task(
category="deep",
load_skills=["remove-ai-slops"],
run_in_background=true,
description="Slop removal: {filename}",
prompt="""
Remove AI slops from: {file_path}
First run the deletion ladder from Phase 3 on this file (delete entirely / reuse existing repo code / platform-stdlib-native / simplify in place); only code that must exist proceeds to smell removal.
Then evaluate EVERY category defined in this skill's "Categories (what counts as slop)" section, applying that section's KEEP and REFACTOR rules verbatim — the Categories section you have loaded is canonical, do not work from a restated subset.
Apply changes in this order (safest → riskiest): comments → dead code → defensive → duplication → complexity → abstraction/boundary → performance → oversized-modules.
Hard constraints:
- Behavior MUST be preserved. When equivalence is not obvious, SKIP.
- Do NOT change public API signatures.
- Do NOT remove type hints.
- Do NOT introduce new abstractions or dependencies.
- Diff stays minimal and scoped to slop removal.
Report changes grouped by category. For each change, give before/after, why-slop, why-safe.
For each skipped issue, give reason.
"""
)
Batch failure handling: a multi_agent_v1.wait_agent timeout only means no new mailbox update arrived, not that a deep agent failed. For long passes, require each child to send WORKING: <file> - <current phase> and BLOCKED: <reason> only when it cannot progress. Treat a running child as alive. Mark a file for retry only when the child is completed without the deliverable, ack-only after followup, explicitly BLOCKED:, or no longer running. Do NOT block the remaining 4 in that batch; collect successful results and retry the failed file once later. If retry also fails, escalate that file under "Issues Found & Fixed" in the final report.
Run the five quality gates listed above. Then walk the critical review checklist:
Safety:
Behavior:
Quality:
If any gate fails or any checklist item flips:
git checkout the affected file (or use git diff + targeted Edit to revert just the problematic hunk).If you fail three times on the same file, STOP and escalate to the user with: the file, what you tried, what failed, your hypothesis. Do not keep editing.
AI SLOP REMOVAL REPORT
======================
Scope: [branch diff vs merge-base main / explicit file list]
Files: [N files]
- path/to/file1.ts
- path/to/file2.py
Behavior Lock:
- Existing coverage: [N files already covered]
- Tests added: [M new regression tests at path/to/test_X.py]
- Baseline status: GREEN
Cleanup Plan:
- path/to/file1.ts: [ladder: 1 delete (native) + simplify-in-place] → [dead code → complexity → performance]
- path/to/file2.py: [ladder: all simplify-in-place] → [comments → defensive]
Per-File Results (each cut shows what replaces it):
path/to/file1.ts
- Ladder/delete: custom DatePicker (48 lines) → <input type="date"> (native), flatpickr import removed
- Dead code: 3 removed (lines X-Y, A-B, C) → nothing (unreachable)
- Excessive complexity: 1 simplified (nested ternary at L42 → if/else)
- Performance: 1 (line N: list scan → set lookup, O(n²)→O(n), behavior identical)
- Skipped (preserved): 2 (defensive null check at boundary; commented WHY at L88)
path/to/file2.py
- Obvious comments: 5 removed → nothing
- Over-defensive: 1 simplified (redundant isinstance on typed param)
Quality Gates:
- Regression tests: PASS (12 tests, 0 failed)
- Lint: PASS
- Typecheck (lsp_diagnostics + project): PASS (0 new errors on changed files)
- Unit/integration tests: PASS (45 tests, 0 failed)
- Static/security scan: N/A (not configured)
Critical Review:
- Safety: PASS
- Behavior: PASS
- Quality: PASS
Issues Found & Fixed:
- [None] OR [Issue description → Fix applied]
Net Impact:
- LOC: -74 (removed 91, added 17)
- Dependencies: -1 (flatpickr removed; native <input type="date"> used)
- Files deleted: 1 (src/date-picker-wrapper.ts — platform-native replacement)
Remaining Risks / Deferred (this section is the debt ledger):
- [None] OR [e.g., "boundary violation in module X flagged but not refactored — needs human judgment"]
- `debt:` markers kept this pass: [None] OR [file:line — ceiling → upgrade trigger]
Final Status: CLEAN | ISSUES FIXED | REQUIRES ATTENTION
N/A and why. If a check failed and you could not fix it, say so. Never claim PASS without evidence.lsp_diagnostics, the test runner, and direct file reads until the result is grounded.