Back to Ponyc

Systematic-testing replay determinism depends on sorting recipient-scheduling sends by actor id, not pointer order

.known-couplings/systematic-testing-send-ordering.md

0.66.03.4 KB
Original Source

Systematic-testing replay determinism depends on sorting recipient-scheduling sends by actor id, not pointer order

Under use=systematic_testing a fixed --ponysystematictestingseed must replay one scheduler interleaving. That holds only because every place that schedules actors by sending them a message while iterating a pointer-hash-keyed (ponyint_hash_ptr(actor)) map sorts the recipients by the stable pony_actor_t.systematic_testing_id (assigned in pony_create, systematic-testing builds only) before sending — a send schedules its recipient (pony_sendv calls ponyint_sched_add on an empty→non-empty queue), so the send order is the run-queue order is the interleaving, and ASLR randomizes actor addresses per run. The sorted sites today: ponyint_sched_unmute_senders (src/libponyrt/sched/scheduler.c, the muting/unmuting reschedule), the ORCA reference-counting sends ponyint_gc_sendacquire / ponyint_gc_sendrelease_manual (src/libponyrt/gc/gc.c, via gc_drain_acquire_ordered) and send_release from ponyint_actormap_sweep (src/libponyrt/gc/actormap.c), and the cycle detector's own sends — check_blocked (ACTORMSG_ISBLOCKED), send_conf (ACTORMSG_CONF), the deferred detect-processing order, and collect's per-cycle-member finalizer/release/destroy passes — in src/libponyrt/gc/cycle.c, all via view_systematic_testing_id_cmp. (collect's cross-member release order matters because ponyint_actormap_sweep only sorts the foreign releases within one member's sweep, not across members.) Add a new per-recipient send loop on the GC/scheduler path that iterates such a map and you silently reintroduce layout-dependent replay unless you sort it the same way. This is not caught per-PR: nothing in normal CI builds systematic_testing, so the guard is .ci-scripts/systematic-testing/determinism_smoke.py, which runs weekly from .github/workflows/ponyc-weekly-checks.yml — a regression passes its own PR and only surfaces at the next weekly run or when someone next replays a seed. The check_blocked rate limiter (which probes only a layout-dependent subset of d->views per sweep once more than CD_MAX_CHECK_BLOCKED actors are live, advancing the last_checked bucket-index cursor) is disabled under USE_SYSTEMATIC_TESTING, so every sweep probes all of d->views in one sorted pass and the cursor always resets — keep it that way, or that subset selection becomes layout-dependent again. The one remaining pointer-ordered loops are the shutdown finalizer/destroy passes in the static final() helper (reached via cycle_terminate), intentionally left unsorted: they run at runtime termination after all observable output, and _final cannot send messages, so their order cannot affect replay. Note one deliberate consequence of the deferred() sort: because mark_grey clears view->deferred for every view its scan reaches, ordering the deferred set by id changes which detect() calls run, not just their order — so a systematic-testing build walks the detector's maps in an order no production build does. That is the point (reproducible replay), but it means detection-sequence-dependent behavior is outside what systematic testing can surface. Run: make configure config=debug use=scheduler_scaling_pthreads,systematic_testing && make build config=debug, then python3 .ci-scripts/systematic-testing/determinism_smoke.py.