Back to Ponyc

The systematic-testing scheduler's three park sites must each guard their cond-wait with a re-park loop

.known-couplings/systematic-testing-park-sites.md

0.66.02.2 KB
Original Source

The systematic-testing scheduler's three park sites must each guard their cond-wait with a re-park loop

Under use=systematic_testing the scheduler serializes execution to one thread at a time and hands off by setting the shared active_thread pointer to the next thread, waking it, and parking the current one. There are three places a thread parks waiting its turn, all in src/libponyrt/sched/systematic_testing.c: ponyint_systematic_testing_wait_start (initial barrier), the yield handoff in ponyint_systematic_testing_yield, and the one-shot coordinator park at the end of ponyint_systematic_testing_start. Each must wrap its ponyint_thread_suspend in a while(active_thread != <this thread's slot>) re-check-and-re-park loop. pthread_cond_wait (the suspend on the scheduler_scaling_pthreads path) is allowed to return spuriously (POSIX), and a stray wake is likelier under load; a bare suspend with no loop lets a spuriously-woken thread fall through and run while active_thread still points at another thread, desyncing the single-runner handoff and deadlocking every thread. Delete the loop at any one of the three sites and you silently reintroduce an intermittent hang. This is not caught per-PR: nothing in normal CI builds systematic_testing, so the only standing coverage is the generative systematic stress jobs (.github/workflows/stress-test-generative-systematic-*.yml, daily) and .ci-scripts/systematic-testing/determinism_smoke.py (weekly) — and both catch it only probabilistically, because the trigger is a spurious wake that may not occur on any given run, and a hit surfaces only as a watchdog timeout with no thread state. The generative engine (test/rt-stress/generative/main.pony) reaches natural quiescence on success — it no longer forces an exit once its conservation oracle passes — so its systematic stress runs now exercise the coordinator park and the full runtime shutdown that the forced exit previously cut short, strengthening (but still only probabilistically) this coverage. Run: make configure config=debug use=scheduler_scaling_pthreads,systematic_testing && make build config=debug, then python3 .ci-scripts/systematic-testing/determinism_smoke.py and the generative harness under test/rt-stress/generative.