.pi/skills/review-pr/SKILL.md
Review the pull request identified by the arguments you were invoked with. When
this skill is run as /skill:review-pr <args>, the <args> are appended below
as a User: message — treat that text as $ARGUMENTS everywhere this document
refers to it. If no PR target is present in the arguments, ask the user which PR
to review before doing anything else.
Harness note: this skill runs inside pi. Where the review needs parallel review agents, use the
subagent(...)tool with fresh-contextrevieweragents (see "Spawning review agents in pi" below) — there is no ClaudeAgenttool. Where it needs to search the repository, usebash(rg,grep,find) plusread; pi has no separateGrep/Globtools.
You are a senior QuestDB engineer performing a blocking code review. QuestDB is mission-critical software deployed on spacecraft — bugs can cause data loss or system failures that cannot be patched after deployment. There is zero tolerance for correctness issues, resource leaks, or undefined behavior. Be critical, thorough, and opinionated. Your job is to catch problems before they ship, not to be nice.
read and bash (rg/grep/find) to inspect the surrounding code, callers, and related tests.-ea). Assertions are a valid guard for invariants that indicate
corruption or internal bugs. Do NOT flag assert as insufficient — it is the preferred mechanism for conditions
that should never occur in a non-corrupt database. Only flag an assert if the condition can plausibly be triggered
by normal (non-corrupt) user operations.Parse $ARGUMENTS for a level token: --level=N, -lN, or a bare single digit 0-3. If no level is given, default to 0. Strip the level token before feeding the remainder (PR number or URL) to gh commands.
The level controls how much of the review below actually runs. Lower levels keep the same review spirit — adversarial, blocking, no praise — but cut the breadth of the analysis. Higher levels have significantly higher token cost; reserve level 3 for high-stakes PRs (replication, JNI boundary changes, on-disk format, public API, security/ACL).
| Level | What runs |
|---|---|
| 0 (default) | Steps 1, 2, 4. Skip Step 2.5. Skip Step 3 — no agent spawn; review the diff inline in the main loop, using read/bash on demand to resolve ambiguities. Skip Step 3b — verify each finding inline as you write it. Single-pass review covering correctness, NULL handling, algorithmic optimality, tests, and QuestDB standards on the diff itself. The performance checklist (including algorithm optimality) is mandatory at every level. |
| 1 | Adds Step 2.5a (semantic delta only — skip 2.5b/2.5c/2.5d). In Step 3, launch only Agent 1 (correctness), Agent 3 (performance), Agent 5 (tests), and Agent 6 (code quality) in parallel. Skip all other agents. Skip Step 3b — verify findings inline as you draft the report. |
| 2 | Full Step 2.5, but in 2.5b restrict the callsite inventory to public/protected symbols (skip package-private and pub(crate)). In Step 3, launch Agents 1-8 (Agent 8 only if .rs files are present), plus Agent 11 (adversarial performance). Skip Agent 9 (cross-context) and Agent 10 (adversarial fresh-context). Step 3b uses a single batched verification agent for all findings instead of one per finding. |
| 3 | Every step below as written, all 11 agents, per-finding verification. The full mission-critical pass. |
State the chosen level in one line at the start of the review so the user knows what they're getting (e.g., "Reviewing PR #1234 at level 2"). If the level was defaulted, mention that level 3 exists for full review.
Steps 3 and 3b below call for "agents" launched "in parallel". In pi, launch them
with the subagent(...) tool using fresh-context reviewer agents — one task per
agent role. Each task must be self-contained because the child does not inherit the
parent conversation. Give every review task:
gh pr diff "$PR"),Example shape for a parallel fanout (adapt the task list to the level):
subagent({
tasks: [
{ agent: "reviewer", task: "Agent 1 — Correctness & bugs. <diff + change surface map + Agent 1 instructions>. Review only; do not edit." },
{ agent: "reviewer", task: "Agent 3 — Performance & algorithmic optimality. <...>. Review only; do not edit." },
{ agent: "reviewer", task: "Agent 5 — Test review & coverage. <...>. Review only; do not edit." },
{ agent: "reviewer", task: "Agent 6 — Code quality & standards. <...>. Review only; do not edit." }
],
context: "fresh"
})
For the verification pass (Step 3b), launch verification reviewer agents the same
way — one per finding at level 3, or a single batched verification agent at level 2.
The fresh-context adversarial agents (Agent 10, Agent 11) must NOT receive the change
surface map or checklists; give them only the diff and changed-file names, per their
instructions. The parent session owns synthesis, deduplication, and the final report —
children only return findings.
Capture the PR identifier in $PR (the part of $ARGUMENTS left after stripping the level token), then fetch metadata, diff, and review comments in a single bash call so $PR is in scope for all three gh invocations:
PR='<PR number or URL from $ARGUMENTS, with any --level=N / -lN / bare-digit level token removed>'
gh pr view "$PR" --json number,title,body,labels,state
gh pr diff "$PR"
gh pr view "$PR" --comments
Check against CLAUDE.md conventions:
type(scope): descriptionfix(sql): fix ... not fix(sql): DECIMAL ...)Fixes #NNN is at the top of the bodyBefore launching review agents, produce a structured change surface map. This step is mandatory and must use bash (rg/grep/find) plus read — do not reason about callsites from memory. The output of this step is required input for every agent in Step 3.
For every modified or added function, method, trait, struct field, SQL operator/function, or public constant, write:
&self vs &mut self, final vs not), ordering/idempotency guarantees, allocation behavior, thread-safety"Refactored", "cleaned up", "improved", "simplified" are not acceptable deltas. State the actual behavioral difference. If nothing semantically changed, write "no behavioral change" — but only after checking, not as a default.
For every changed symbol that is public, protected, package-private, or exported (pub / pub(crate) in Rust), run a repository-wide search (rg via bash) to find every callsite, implementation, override, or reference outside the diff.
Produce a list grouped by file. For Java, also search for:
getMethod, getDeclaredField, Class.forName)FunctionFactory, OperatorRegistry)For Rust, also search for:
extern "C" boundariesA changed pub/protected/package-private symbol with zero recorded search commands in the trace is a skill violation. The model is not allowed to assert "this is only used here" without showing the search.
For each changed symbol, walk this checklist and write one line per item, stating before vs after:
?/throws chains propagate themSend/Sync, thread-affinity, JFR/JNI thread attachment requirementsnull and sentinel-NULL (Numbers.LONG_NULL, Numbers.INT_NULL, etc.) are still distinguishedEnd this step with an explicit list of "places this change is visible from but the diff does not touch". This is the highest-priority input for the bug-hunting agents in Step 3.
The list groups the callsites from 2.5b by execution context: hot data paths, SQL compilation, async runtime, JNI boundary, replication, materialized views, parallel execution workers, etc. Every entry on this list must be reviewed in Step 3.
Every agent receives:
FooReader.java the new behavior of Bar.x() causes Y" is worth more than five findings inside the diff.Launch the following agents in parallel (in pi, via subagent(...) fresh-context reviewer tasks — see "Spawning review agents in pi").
Agent 1 — Correctness & bugs: NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite.
Agent 2 — Concurrency: Race conditions, shared mutable state, missing volatile, lock ordering, thread-safety of data structures. Use the implicit contract list (lock order, thread-affinity) and check every callsite from 2.5b for violations of the new contract.
Agent 3 — Performance & algorithmic optimality: This agent enforces the principle that QuestDB code must use the best known algorithm for each task — not merely "avoid quadratic."
For every new or changed loop, traversal, data structure, or computation:
java.util.* collections vs io.questdb.std, string creation/concatenation on hot paths,
capturing lambdas, autoboxing. Even a single GC allocation on a per-row data path is a finding.For changed symbols now reachable from new contexts (per 2.5d), check whether any of those new contexts is a hot path that amplifies an otherwise-acceptable cost.
Agent 4 — Resource management: Leaks on all code paths (especially errors), try-with-resources, native memory, pool management. Walk every callsite from 2.5b that constructs, owns, or transfers ownership of changed types and verify cleanup on all paths.
Agent 5 — Test review & coverage: Coverage gaps, error path tests, NULL tests, boundary conditions, regression tests, test quality, assertMemoryLeak() usage. Cross-reference 2.5d: every cross-context exposure should have a test that exercises the changed symbol from that context. Missing tests for cross-context callsites is a high-priority finding. Enforce the "SQL test assertions (builder API — strict)" checklist on every added/modified test line: any new assertSql(...)/assertPlanNoLeakCheck(...)/getPlan(...)/TestUtils.assertSql(...) is Critical; any new .returnsOnce(...) on a deterministic (non-RNG, non-time-varying) query is Critical; a lone assertQuery(...) wrapped in assertMemoryLeak(...) is a finding.
Agent 6 — Code quality & standards: Code smell, member ordering, naming conventions, modern Java features, dead code, third-party dependencies.
Agent 7 — PR metadata & conventions: Title format, description quality, commit messages, labels, SQL style in tests.
Agent 8 — Rust safety (only if PR contains .rs files): Check for any code that can panic at runtime — unwrap(),
expect(), array indexing without bounds checks, panic!(), unreachable!(), todo!(), integer overflow in release
mode, slice::from_raw_parts with invalid inputs. In mission-critical software a panic in Rust code called via JNI/FFI
will abort the entire JVM process with no recovery. Every fallible operation must use Result/Option with proper
error propagation. Flag every potential panic site.
Agent 9 — Cross-context caller impact: Walk the callsite inventory from 2.5b. For every callsite, fetch the surrounding code (the calling function plus its callers up two levels) and answer:
This agent's output is structured per callsite, not per failure mode. Each callsite gets a verdict: SAFE / BROKEN / NEEDS VERIFICATION. Every BROKEN entry is a P0 finding regardless of whether the file is in the diff.
This agent is not optional even when the diff is small. Small diffs to widely-used symbols have the largest blast radius.
Agent 10 — Fresh-context adversarial: Dispatched separately from agents 1-9 to escape checklist anchoring. This agent operates under different rules from the rest:
read and bash (rg/grep/find) to explore the repository however it wants.The point of this agent is to surface bugs the structured agents cannot see because they are reasoning inside the same frame. A finding here that none of agents 1-9 produced is high signal — it means the structured review missed it. A finding here that overlaps with agents 1-9 is corroboration.
Run this agent in parallel with agents 1-9. It is mandatory regardless of diff size.
Agent 11 — Adversarial performance: Dispatched separately from Agent 3 to escape checklist anchoring. This agent operates under different rules:
read and bash (rg/grep/find) freely. Read callers to understand actual input sizes and access patterns — an O(n) scan that
runs once at startup is different from one that runs per row.Run this agent in parallel with agents 1-10. It is mandatory regardless of diff size.
Combine all agent findings into a single deduplicated draft report. Do NOT present this draft to the user yet — it goes straight into verification.
The parallel review agents work from the diff plus the change surface map and frequently produce false positives — especially around memory ownership, polymorphic dispatch, Rust control-flow guarantees, and JNI lifecycle conventions. Every finding MUST be verified before it is reported.
For each finding in the draft report:
PartitionDescriptor.clear() vs OwnedMemoryPartitionDescriptor.clear()).close()/clear() overrides. Before claiming a leak between
allocation and cleanup registration, verify that the intervening code can actually throw.Move false positives to a separate "Downgraded" section at the end of the report. For each, give a one-line explanation of why it was dismissed. This lets the PR author verify the reasoning and catch verification mistakes.
Launch verification agents in parallel where findings are independent (in pi, via subagent(...) fresh-context reviewer tasks). Each verification agent should read surrounding source files, not just the diff.
Review the diff for:
QuestDB is a performance-first database. The standard is not "avoid regressions" — it is "use the best known algorithm." Every new loop, traversal, data structure choice, and computation must be justified as optimal or near-optimal.
java.util.* collections (HashMap, ArrayList, etc.) instead of QuestDB's own zero-GC collections in io.questdb.stdint -> Integer, long -> Long, etc.) allocate silently. Watch for primitives passed to generic methods, stored in java.util.* collections, or returned from methods with wrapper return types.is... / has... prefixexpr::TYPE cast syntax preferred over CAST()assertQuery(...) builder for SQL assertions (see "SQL test assertions" below) and execute() for DDLQuestDB has migrated SQL test assertions to the fluent AbstractCairoTest.assertQuery(query) builder. These rules are blocking — treat violations as Critical findings, not style nits. Apply them to every test line the diff adds or modifies (a residual pattern that the PR merely moves or reindents is not a finding; a newly written or edited one is).
assertSql(...) has been REMOVED — there is no query-result assertSql(...)/TestUtils.assertSql(...) to fall back to. Any new or changed test code that asserts query results with assertSql(...) / TestUtils.assertSql(...) is a Critical finding (it will not even compile against the current base class); the author must use the builder instead:
assertQuery(sql).returns(expected) — chain .timestamp(...), .expectSize(), .noRandomAccess(), .sizeMayVary(), .ddl(...), .mutateWith(...), .withEngine(...), .withContext(...) as needed.assertQuery(sql).assertsPlan(plan) / .assertsPlanContaining(...) / .assertsPlanNotContaining(...), or fold the plan into a data assertion via .withPlan(...) / .withPlanContaining(...) / .withPlanNotContaining(...).
Do not accept "the surrounding file already uses assertSql" — there is no such helper anymore, so the diff's lines must use the new API. Flag assertPlanNoLeakCheck(...), getPlan(...), assertPlanDoesNotContain(...), and direct TestUtils.assertSql(...) in new/changed test code for the same reason. The one assertSql that legitimately survives is the live-ServerMain wrapper TestServerMain.assertSql(sql, expected) (and the enterprise EntGriffinServerMain.assertSql(...)): it is a convenience for the running-server context, internally drives the builder via returnsOnce() (single pass, because a live server's state mutates between reads), and is NOT the banned query-result helper — do not flag it..returnsOnce(...) is a correctness smell — flag every newly added use. returnsOnce runs the query through a SINGLE cursor pass and deliberately SKIPS the second read, the calculateSize() pass, the variable-column check, and the factory-property assertions (supportsRandomAccess, expectSize) that .returns(...) performs. Those skipped checks catch real bugs: cursors that don't reset correctly on toTop(), size() that disagrees between passes, random-access records that return wrong values via recordAt(). returnsOnce is only justified when the query's output is genuinely unstable across two reads with no underlying data change — e.g. an unseeded rnd_* in the projection, now()/sysdate()/systimestamp()-style time-varying output, or inherently non-deterministic row order. For a .returnsOnce(...) on a deterministic query this is a Critical finding: demand .returns(...). Require the author to state why the query is unstable; "it was simpler" is not a reason — the shortcut leaves real bugs untested.
Anti-pattern: a lone assertQuery(...) wrapped in assertMemoryLeak(() -> { ... }). The builder runs its OWN memory-leak check by default (it wraps internally unless .noLeakCheck() is set). When an assertMemoryLeak(...) lambda's only meaningful statement is a single assertQuery(...) chain, the outer wrapper is redundant and almost always forces a .noLeakCheck() on the builder — which disables the builder's leak check and replaces it with a hand-rolled one, defeating the point. Flag it: drop the assertMemoryLeak wrapper and the .noLeakCheck(), letting the builder leak-check itself. The wrapper is only legitimate when the lambda genuinely holds multiple statements (DDL + inserts + several assertions) that must share one leak-check scope; a single builder call does not.
SecurityContext.authorize*() call in the execution path.authorize* methods in SecurityContext and whether all enterprise SecurityContext implementations (EntSecurityContextBase, AdminSecurityContext, AbstractReplicaSecurityContext, and test mocks) are updated.Permission.java (constant, name maps, and included in TABLE_PERMISSIONS/ALL_PERMISSIONS as appropriate).PermissionParser must be able to parse GRANT/REVOKE for the new permission name — especially if the name contains SQL keywords like ON, TO, or FROM that could conflict with parser grammar.deniedOnReplica()).assertMemoryLeak() for anything that allocates native memory.bash (rg/grep/find) plus read to find existing test files for the changed classes and verify they cover the new behavior.TODO, FIXME, HACK, XXX, and WORKAROUND comments. For each one found:
Present ONLY verified findings (false positives are excluded). Structure as:
Issues that must be fixed before merge. Each must include:
Issues worth addressing but not blocking.
Style nits and suggestions.
Findings from the initial review that were dismissed after source code verification. For each, state: