Back to Dyad

Faster PR Workflows Without Sacrificing Quality

plans/faster-pr-workflows.md

0.44.016.5 KB
Original Source

Faster PR Workflows Without Sacrificing Quality

Generated by swarm planning session on 2026-02-15

Summary

Improve the speed of landing PRs for the maintainer (wwwillchen/keppo-bot) by optimizing the CI pipeline, automating the fix-and-retry loop, and improving developer feedback during wait times. The current workflow is already highly automated — this plan targets the remaining friction: CI wall-clock time, human-in-the-loop steps, and the "waiting black hole" between push and merge.

Problem Statement

The current PR landing flow involves a multi-cycle loop: push → wait for CI (~10-15 min) → wait for AI review (~10-30 min) → address comments → re-push → wait again. Each iteration adds latency, and for 3-4 PRs per day, this compounds into 1-2 hours of waiting plus context-switching costs. The quality gates (multi-reviewer AI review, comprehensive E2E testing, lint/type checks) are working well — the problem is throughput, not quality.

Scope

In Scope (MVP)

  • CI pipeline optimizations for self-hosted macOS ARM64 runners (caching, job consolidation)
  • Auto-trigger fix loop for privileged author PRs (remove label requirement)
  • Tiered test selection based on changed files
  • PR status watcher for terminal notifications
  • Unified /pr skill entry point
  • Move pr-review-responder to self-hosted runners (enable E2E fix capability)

Out of Scope (Follow-up)

  • Auto-merge changes (keep manual "merge-when-ready" label)
  • External contributor experience improvements
  • Bash script alternative to /fast-push (optimize existing skill instead)
  • Dry-run/preview mode for skills
  • Incremental Electron builds
  • Notification channels beyond terminal (desktop, Slack)
  • Multi-PR-in-flight status dashboard

User Stories

  • As a maintainer, I want CI to complete faster on my self-hosted runners, so I spend less time waiting between push and feedback.
  • As a maintainer, I want CI failures to be automatically diagnosed and fixed without me triggering a skill, so the fix-push-verify loop runs hands-free.
  • As a maintainer, I want to be notified in my terminal when CI completes (pass or fail), so I can confidently context-switch to other work during CI runs.
  • As a maintainer, I want a single /pr command that detects what my PR needs and does the right thing, so I don't need to remember which of 10+ skills to invoke.
  • As a maintainer, I want CI to skip E2E tests for low-risk changes (docs, config, test-only), so trivial PRs land faster without wasting runner time.
  • As a maintainer, I want the automated fix loop to be able to fix E2E failures (not just lint/type/unit), so fewer PRs get stuck at needs-human:review-issue.

UX Design

User Flow (Optimized)

  1. Developer writes code (with Claude Code assistance)
  2. Runs /pr — unified entry point detects uncommitted changes, runs lint/format/type-check/unit-tests locally, commits, pushes, creates/updates PR
  3. /pr kicks off a background PR watcher that polls CI status
  4. Developer context-switches to other work
  5. Terminal notification: "CI passed / CI failed on PR #123"
  6. If CI failed: auto-fix loop triggers automatically (no label needed), developer sees notification when fix is pushed and re-CI starts
  7. If all checks pass: developer adds "merge-when-ready" label → auto-merge
  8. If auto-fix loop fails after 4 iterations: needs-human:review-issue label applied, developer investigates

Key States

  • Pushing: /pr runs local validation and pushes. Terminal shows progress via task tracking.
  • Waiting: Background watcher polls. Developer sees no output unless something happens.
  • CI Passed: Terminal notification (bell + message). Developer returns to add merge label.
  • CI Failed: Terminal notification. Auto-fix loop kicks in automatically. Developer notified of each iteration.
  • Fix Loop Exhausted: needs-human:review-issue label. Terminal notification. Developer investigates.
  • Merged: After "merge-when-ready" label + CI pass, auto-merge fires. PR closed.

Interaction Details

  • /pr is the primary entry point. It inspects current state (uncommitted changes? open PR? failing checks? review comments?) and routes to the appropriate action.
  • Sub-commands available for power users: /pr push, /pr fix, /pr watch, /pr rebase
  • Background watcher uses gh pr checks --watch or polling gh pr checks every 30 seconds
  • Terminal notifications use bell character + colored status line

Accessibility

  • All feedback is text-based (terminal output), compatible with screen readers
  • Status notifications include text labels, not just colors
  • No custom UI components — leverages GitHub's accessible web interface for detailed status

Technical Design

Architecture

All changes are to the CI/CD pipeline, GitHub Actions workflows, and Claude Code skills. No application code changes.

Components Affected

ComponentChangeComplexity
.github/workflows/ci.ymlAdd persistent caching, merge build+E2E for self-hosted, tiered test selectionMedium
.github/workflows/pr-review-responder.ymlRemove label requirement for privileged authors, move to self-hosted runnersSmall
scripts/ci-cleanup-macos.shPreserve caches (Playwright, pnpm store, nextjs-template, scaffold deps)Small
playwright.config.tsAdd test tagging/grouping for tiered selectionMedium
.claude/skills/pr/SKILL.md (new)Unified /pr entry point that routes to sub-skillsMedium
.claude/skills/pr-watch/SKILL.md (new)Background CI status watcher with terminal notificationsSmall
.claude/skills/fast-push/SKILL.mdOptimize for speed — reduce unnecessary stepsSmall

Data Model Changes

None. All improvements are to pipeline/tooling configuration.

API Changes

None. Internal tooling only.

Implementation Plan

Phase 0: Instrument (prerequisite)

  • Add CI job timing annotations (start/end timestamps per job) to measure baseline
  • Track PR cycle time (first push to merge) — could be a simple GitHub Action that logs timestamps
  • Measure auto-fix loop iteration frequency (how often does cc:request:2, cc:request:3, cc:request:4 get reached?)
  • Measure CI first-push pass rate for privileged author PRs
  • Document current scripts/ci-cleanup-macos.sh behavior — confirm what's already cached vs. cleaned

Phase 1: CI Pipeline Quick Wins

  • Optimize self-hosted runner caching: Preserve Playwright browser binaries, pnpm store, nextjs-template clone + deps, scaffold deps between runs. Update scripts/ci-cleanup-macos.sh to keep these cached (keyed on lockfile/dependency hashes). Note: node_modules/ already persists (cleanup script doesn't remove it), so npm ci is already partially cached — focus caching effort on Playwright, pnpm store, and template deps.
  • Skip redundant installs: Check if Chromium is already installed before running npx playwright install chromium. Check if nextjs-template exists and is up-to-date before cloning.
  • Merge build + E2E into single job for self-hosted runners: Eliminate artifact tar/upload/download overhead and redundant setup steps (second checkout, second npm ci, second pnpm/scaffold/nextjs-template setup). Keep separate jobs for external contributors (they need cross-OS artifacts). Estimated savings: 3-5 min per CI run.
  • Validate caching gains: Compare CI timing before/after to confirm actual savings. Run a single diagnostic PR with date commands at each step boundary to establish baseline.

Phase 2: Auto-Fix Loop Enhancement

  • Auto-trigger fix loop for privileged authors: Modify pr-review-responder.yml to trigger for all privileged-author PRs, not just those with cc:request label. Auto-apply cc:request on PR creation for privileged authors, or remove the label check entirely.
  • Move pr-review-responder to self-hosted macOS runners: Currently runs on ubuntu-latest which cannot execute Electron Playwright tests. Moving to self-hosted enables the fix loop to actually fix E2E failures, not just lint/type/unit failures.
  • Add skip-review for trivial fixes: When the fix commit only changes lint/formatting or updates snapshots, automatically skip re-triggering the Claude PR Review. Use a heuristic: if diff is under N lines and only touches files mentioned in resolved review comments.
  • Add notification comments: When auto-fix triggers, post a brief PR comment ("CI failed on lint. Auto-fix triggered.") so the developer has context without reading Action logs.

Phase 3: Tiered Test Selection

  • Implement change classification in check-changes job: Classify changed files into tiers:
    • Tier 0 (docs/config/.claude/rules only): Skip CI entirely (already exists)
    • Tier 1 (test files only): Run lint + type-check + only the changed test files
    • Tier 2 (source code, no schema/config changes): Run lint + type-check + unit tests + targeted E2E
    • Tier 3 (broad changes, deps, config): Full CI suite
  • Tag E2E tests by feature area: Add metadata to Playwright tests indicating which source files/features they cover. Start with coarse groupings (chat, database, deployment, settings, etc.).
  • Add full-ci escape label: Allow forcing full CI via a label for cases where the tier classification seems wrong. Post a CI comment explaining which tier was selected and why.
  • Maintain nightly full-suite coverage: The existing deflake-e2e scheduled run provides a safety net for regressions missed by tiered selection.

Phase 4: Developer Experience

  • Create unified /pr skill: Single entry point that inspects state and routes:
    • Uncommitted changes → push flow (lint, commit, push, create/update PR)
    • Open PR with failing checks → fix flow (diagnose + fix CI failures)
    • Open PR with unresolved review comments → comment fix flow
    • Open PR with all checks passing → suggest adding "merge-when-ready" label
    • Open PR needing rebase → rebase flow
    • Fold /fast-push behavior into /pr as the default for deterministic operations (use haiku), with opus reserved for reasoning-heavy tasks (conflict resolution, complex review responses). Deprecate /fast-push as a separate entry point.
  • Create /pr-watch skill: Background watcher that polls gh pr checks and notifies on completion. Should surface specific failure type (lint, type-check, unit test, E2E) so developer knows whether to intervene or let auto-fix handle it. Could be as simple as gh pr checks --watch with a terminal bell on exit.
  • Add pre-push snapshot warning: If UI source files changed but E2E snapshot files didn't, warn that snapshots likely need updating before push. Lightweight heuristic to prevent the most common E2E failure cause (~30% of E2E failures).

Testing Strategy

  • CI timing: Compare wall-clock CI duration before/after each phase. Target: 30-50% reduction for typical PRs.
  • Cache correctness: Verify builds with cached dependencies produce identical outputs to clean builds. Run both in parallel for a verification period.
  • Tier classification accuracy: Run full CI alongside tiered CI for 2 weeks to measure false-negative rate (changes that should have triggered full CI but didn't).
  • Auto-fix loop effectiveness: Track how often the fix loop resolves issues automatically vs. escalating to human. Target: fix loop handles >80% of CI failures without human intervention.
  • Quality regression: Monitor for bugs/reverts within 24 hours of merge. Must not increase from current baseline.

Risks & Mitigations

RiskLikelihoodImpactMitigation
Stale caches cause false CI passes on self-hosted runnersMediumHighHash-based cache keys, periodic full cache clear, nightly clean build
Tiered test selection misses regressionsMediumHighConservative tier classification, full-ci escape label, nightly full suite, always full CI on main push
Auto-fix loop on self-hosted runners consumes capacityLowMedium2+ runners available; add concurrency limits if needed
Merged build+E2E job can't retry just E2E on flakesMediumMediumMonitor flake rate; revert to split jobs if flake-retry savings > artifact-transfer savings
Unified /pr skill makes wrong routing decisionLowLowShow what action will be taken before executing; sub-commands for manual override
Moving pr-review-responder to self-hosted increases runner loadLowLow2+ runners available; responder runs are short-lived

Open Questions

  1. What is the actual CI time breakdown per job? Phase 0 instrumentation will answer this. The answer determines whether Phase 1 (caching) or Phase 3 (test selection) delivers more value.
  2. How often does the auto-fix loop iterate beyond 1 cycle? If most PRs converge in 1 iteration, loop optimization matters less. If 3-4 iterations are common, optimizing the loop is critical path.
  3. What is the E2E flake rate? Determines whether merging build+E2E jobs (losing independent E2E retry) is a net win or net loss.
  4. How should E2E tests be tagged/grouped? Coarse groupings (by feature area) are easier to maintain but less precise. Fine-grained file-level dependency tracking is more accurate but harder to keep updated. Start coarse and refine.

Decision Log

DecisionOptions ConsideredChosenReasoning
Auto-merge approachDefault auto-merge, auto-merge on cc:done, keep currentKeep currentUser prefers the deliberate "merge-when-ready" checkpoint. Focus speed improvements elsewhere.
Target personaMaintainer only, both maintainer + contributorsMaintainer onlyFocused scope delivers faster. External contributor improvements deferred.
Fast push implementationBash script, optimize existing skill, both pathsOptimize existing skillAvoids maintaining two implementations. User prefers single path.
Runner capacityConstrained (1 runner), unconstrained (2+)2+ runners availableRunner capacity is not a constraint. Can add more concurrent workflows freely.
Review blocking mergeNon-blocking (post-merge review), blockingKeep blockingTeam consensus: post-merge review comments have low compliance. Keeping review as blocking preserves quality gate. Speed up the review instead of removing it.
Bash fast-push scriptBuild it, don't build itDon't buildUX team raised valid concern about fragmentation. Optimize /fast-push skill instead.

Generated by dyad:swarm-to-plan