.agents/skills/maintainer-review/SKILL.md
Make a maintainer decision, not a generic code-review summary. Separate these questions:
Lead with the current review state. Use Preliminary assessment while runtime approval or evidence is pending, and Maintainer decision only when the review can be concluded. Use the diff, issue narrative, or contributor effort as evidence, not as a proxy for impact.
Respect repository instructions for remote access and mutation. A review does not authorize comments, labels, branch changes, pushes, or other remote writes.
Do this before deeply evaluating a specified PR. A PR URL selects the starting point, not necessarily the entire comparison set.
When multiple candidates exist, compare them on need coverage, runtime correctness, scope, implementation layer, tests, compatibility, complexity, readiness, remaining maintainer work, and whether useful parts can be combined. Prefer the best maintainable solution, not the first submission or the smallest diff by default.
Always begin with a desk review. Inspect the concrete runtime path before judging a small change as either trivial or meaningful. Check callers, adjacent helpers, validation layers, fallback paths, and existing tests. Search history or documentation only when it changes the decision. Inspecting test code is part of the desk review; executing tests, imports, examples, reproductions, benchmarks, or service calls is a runtime probe.
For repository-specific runtime invariants, start with .agents/references/README.md and open only the references that match the affected boundary. Treat .agents/references/ as read-only during issue and PR review: use it to identify expected invariants, adjacent surfaces, and regression risks, then verify the current claim against the remote change, current code, tests, docs, release boundary, and focused runtime evidence. Do not edit references as a side effect of the review, infer current issue or PR status from them, or treat old issue or PR outcomes as current evidence. If the review reveals a reusable invariant that should be captured, recommend a separate repository-maintenance update unless the user explicitly asks to update references in the same task.
Use this evidence order across the two stages:
Produce an initial result from static evidence before running code:
Preliminary assessment, name the concern, propose the smallest decisive probe and control, and ask the user for approval to run it.Do not issue a definitive positive maintainer decision while a decision-relevant runtime concern remains unresolved. If the user declines the probe, keep the result preliminary and state the exact confidence limitation.
After explicit approval, run only the smallest probe needed to resolve the stated concern. Exercise the real public or internal path and include a base, release, or known-good control when relevant. Do not stop at a happy-path smoke check when failure behavior determines the decision. Return to the user for separate approval before expanding materially beyond the approved probe.
For latency, timeout, buffering, backpressure, or cleanup claims, measure at least one observable elapsed-time or state-transition path when feasible. Do not assume that a mocked unit test exercises real scheduling or provider behavior. Prefer a local probe first; use an approval-gated live-service probe only when local evidence cannot settle the decision.
Use $runtime-behavior-probe only when the user explicitly invokes it and the skill is available, or when the user explicitly approves using it for the proposed runtime work. Preserve its environment-variable approval, live-service, cost, cleanup, and reporting gates. Do not make ordinary maintainer review depend on that skill being available.
For changes involving validation, fail-fast behavior, cleanup, retries, interruption, or concurrency, trace lifecycle ordering in addition to the main behavior:
Do not over-investigate. Stop when additional evidence is unlikely to change validity, severity, or the maintainer recommendation.
Use references/evaluation-framework.md to assess claim validity, realistic reach, consequence, breadth, frequency, recoverability, compatibility, and severity. Keep observed facts separate from inference and state any missing evidence that could change the decision.
For a PR, make Severity describe the underlying issue or user need only. Do not combine it with the risk created by the proposed patch. Report a meaningful patch-induced regression, compatibility, lifecycle, or maintenance risk separately as Patch risk.
Do not infer that a report is low-value merely because an AI may have found or written it. Do not speculate about authorship or motive. Identify contribution-shaped reports through objective signals: no reproducible behavior, unrealistic inputs, an impossible call path, duplicated existing handling, tests that do not exercise the claim, or a fix whose runtime result is a no-op.
Use the framework's issue dispositions and PR checks to decide whether the outcome justifies permanent code, tests, documentation, and maintainer attention. Classify code quality separately from repository readiness.
Use one code recommendation:
For Merge-worthy as-is and Merge-worthy after focused changes, use one repository-readiness status when it helps communicate the integration state:
Omit repository readiness for Supersede with a simpler alternative and Not worth completing; CI, review, mergeability, or branch freshness does not change those dispositions. Put any validation limitation that materially affects confidence in the evidence instead. When readiness is included, use exactly one of the four statuses above and do not invent variants such as ready mechanically or use rebase status for semantic staleness.
Do not downgrade an otherwise sound code recommendation solely because CI is pending. Do not call a PR ready when semantic conflict resolution or material code changes remain.
When multiple open PRs address the same issue, make one portfolio-level recommendation: select the strongest PR, request focused changes in one candidate, combine specific ideas into one PR, supersede all candidates with a simpler approach, or close duplicates. Explain why the recommended path is better than each alternative without turning the report into line-by-line review.
Always consider at least one alternative: no code change, validation or documentation, a narrower fix, reuse of an existing helper, or a different layer that enforces the invariant consistently.
Choose the assessment language using this precedence:
~/.codex/AGENTS.md, the repository's AGENTS.md, or another governing instruction file.Do not infer the assessment language from the GitHub URL, contributor, code, or browser locale. Maintainer comment drafts remain English regardless of the assessment language. Keep the report decision-oriented and compact. Use no more than five evidence bullets by default; add more only when the decision genuinely depends on them.
Use the matching compact report variant in references/evaluation-framework.md. While runtime approval is pending, use its preliminary-assessment variant and end with the approval request instead of presenting a final recommendation. Collapse sections for simple cases rather than padding the answer. Put unexpected or negative runtime findings first, and name the preferred PR or approach explicitly when candidates compete.
When recommending closure, requesting more evidence, requesting code changes, or superseding a PR, append the English, copy-paste-ready maintainer comment defined by the framework. If multiple PRs need different actions, label one draft for each affected PR. Include only merge-blocking requests in the main action paragraph; keep optional documentation or polish clearly non-blocking or omit it.
Do not produce a line-by-line review unless requested. Do not equate passing tests with merge-worthiness, or a logically correct patch with practical value.
references/evaluation-framework.md contains the severity rubric, evidence checks, lifecycle review, issue dispositions, PR quality checks, maintainer-comment guidance, and report variants.