docs/plans/2026-01-05-plan-check-run-ha-design.md
Make the plan check run scheduler HA (High Availability) compatible by following patterns established in the task run scheduler.
The plan check scheduler is not HA-compatible:
sync.Map to track running checksRUNNING → DONE/FAILED/CANCELEDFollow the task run scheduler HA pattern:
FOR UPDATE SKIP LOCKED to atomically claim workAVAILABLE → RUNNING → DONE/FAILED/CANCELEDMigration file: backend/migrator/migration/3.X/XXXX##plan_check_run_ha.sql
-- Add AVAILABLE to status check constraint
ALTER TABLE plan_check_run
DROP CONSTRAINT plan_check_run_status_check,
ADD CONSTRAINT plan_check_run_status_check
CHECK (status IN ('AVAILABLE', 'RUNNING', 'DONE', 'FAILED', 'CANCELED'));
-- Convert existing RUNNING to AVAILABLE (will be re-claimed)
UPDATE plan_check_run SET status = 'AVAILABLE' WHERE status = 'RUNNING';
-- Update index to include AVAILABLE for efficient claiming
DROP INDEX IF EXISTS idx_plan_check_run_active_status;
CREATE INDEX idx_plan_check_run_active_status
ON plan_check_run(status, id)
WHERE status IN ('AVAILABLE', 'RUNNING');
File: backend/store/plan_check_run.go
Add claiming function:
// ClaimAvailablePlanCheckRuns atomically claims all AVAILABLE plan check runs.
// Uses FOR UPDATE SKIP LOCKED for HA-safe concurrent claiming.
func (s *Store) ClaimAvailablePlanCheckRuns(ctx context.Context) ([]*PlanCheckRunMessage, error) {
query := `
UPDATE plan_check_run
SET status = 'RUNNING', updated_at = now()
WHERE id IN (
SELECT id FROM plan_check_run
WHERE status = 'AVAILABLE'
FOR UPDATE SKIP LOCKED
)
RETURNING id, plan_id
`
// Execute and return claimed runs
}
Modify CreatePlanCheckRun: Change default status from RUNNING to AVAILABLE.
File: backend/runner/plancheck/scheduler.go
Remove in-memory tracking - Delete RunningPlanChecks and RunningPlanCheckRunsCancelFunc sync.Maps
Update runOnce() - Use atomic claiming instead of querying RUNNING status:
func (s *Scheduler) runOnce(ctx context.Context) {
claimed, err := s.store.ClaimAvailablePlanCheckRuns(ctx)
for _, planCheckRun := range claimed {
go s.runPlanCheckRun(ctx, planCheckRun)
}
}
Simplify cancellation - Update database status to CANCELED directly
No changes needed - store layer handles status transition.
| File | Change |
|---|---|
backend/migrator/migration/3.X/XXXX##plan_check_run_ha.sql | New migration |
backend/migrator/migration/LATEST.sql | Update schema |
backend/store/plan_check_run.go | Add claiming, change default status |
backend/runner/plancheck/scheduler.go | Remove sync.Map, use claiming |
Existing RUNNING plan check runs are converted to AVAILABLE during migration. They will be re-claimed and re-executed by the scheduler after deployment.