examples/benchmark_py_vs_rs/BENCHMARK_REPORT.md
Results captured from .work/. Nothing was rerun to produce this report.
See README.md if you don't know what the pipeline or the four phases are — this file just reports numbers.
At 10,000+ input files:
xychart-beta
title "Cold speedup (py_ms / rust_ms) — codebase scenario"
x-axis ["tiny", "medium", "large", "xlarge"]
y-axis "speedup (x)" 0 --> 90
bar [2.22, 10.54, 16.89, 32.36]
bar [5.75, 30.04, 52.71, 81.61]
bar [3.13, 11.70, 13.16, 17.58]
Bars are mixed, cpu, io in that order. The docs scenario has the same
shape — bigger scale, bigger gap.
Peak wins (xlarge, cold phase):
| Scenario | Profile | Speedup |
|---|---|---|
| codebase | cpu | 81.61× |
| docs | cpu | 73.69× |
| codebase | mixed | 32.36× |
| docs | mixed | 30.04× |
| docs | io | 23.42× |
| codebase | io | 17.58× |
Yes, at every scale:
warm phases had cache_misses = 0 (every section hit the memo cache).shape phases invalidated only 9–15 sections and rewrote only 4–11
output files, at every scale from tiny to xlarge. Adding/renaming one
file doesn't cascade into a near-full rebuild.| Axis | Values |
|---|---|
| scales | tiny, medium, large, xlarge |
| scenarios | codebase, docs |
| profiles | io, cpu, mixed |
| phases | cold, warm, edit, shape |
Raw file counts generated by common.py. mixed and cpu share the same
counts; io doubles the fanout.
| Scale | all mixed/cpu | all io | codebase mixed/cpu | codebase io | docs mixed/cpu | docs io |
|---|---|---|---|---|---|---|
| tiny | 51 | 102 | 27 | 54 | 24 | 48 |
| medium | 368 | 736 | 176 | 352 | 192 | 384 |
| large | 1280 | 2560 | 704 | 1408 | 576 | 1152 |
| xlarge | 10752 | 21504 | 5632 | 11264 | 5120 | 10240 |
Each cell is rust_ms / py_ms / ratio. Medians across trials. Bigger ratio
is a bigger Rust win.
| Scenario | Profile | Cold | Warm | Edit | Shape |
|---|---|---|---|---|---|
| codebase | io | 68.7 / 215.4 / 3.13x | 47.2 / 99.7 / 2.11x | 45.0 / 106.5 / 2.37x | 77.6 / 142.1 / 1.83x |
| codebase | cpu | 58.0 / 333.7 / 5.75x | 40.7 / 54.8 / 1.34x | 41.8 / 62.8 / 1.50x | 67.2 / 84.0 / 1.25x |
| codebase | mixed | 46.2 / 102.5 / 2.22x | 40.7 / 50.9 / 1.25x | 44.3 / 55.1 / 1.24x | 70.4 / 57.9 / 0.82x |
| docs | io | 55.0 / 217.3 / 3.95x | 44.5 / 86.5 / 1.94x | 45.2 / 148.7 / 3.29x | 69.6 / 99.0 / 1.42x |
| docs | cpu | 50.0 / 319.3 / 6.38x | 42.2 / 52.7 / 1.25x | 43.4 / 60.1 / 1.38x | 66.0 / 73.8 / 1.12x |
| docs | mixed | 46.0 / 91.6 / 1.99x | 41.9 / 50.7 / 1.21x | 42.5 / 54.0 / 1.27x | 68.4 / 55.6 / 0.81x |
At tiny, fixed overhead (process start, LMDB open) dominates. The one row
where Python looks faster is shape / mixed — expected at this scale.
| Scenario | Profile | Cold | Warm | Edit | Shape |
|---|---|---|---|---|---|
| codebase | io | 123.8 / 1448.0 / 11.70x | 76.7 / 504.0 / 6.57x | 83.8 / 519.4 / 6.20x | 104.7 / 502.0 / 4.79x |
| codebase | cpu | 81.0 / 2435.1 / 30.04x | 50.8 / 219.6 / 4.32x | 50.9 / 261.4 / 5.13x | 74.0 / 241.5 / 3.26x |
| codebase | mixed | 54.1 / 569.7 / 10.54x | 50.1 / 191.8 / 3.83x | 55.2 / 209.0 / 3.79x | 71.8 / 206.1 / 2.87x |
| docs | io | 125.1 / 1754.9 / 14.03x | 70.5 / 528.2 / 7.49x | 74.9 / 550.6 / 7.35x | 109.3 / 532.9 / 4.88x |
| docs | cpu | 85.1 / 2995.8 / 35.18x | 51.9 / 233.4 / 4.49x | 53.2 / 248.4 / 4.67x | 76.1 / 259.5 / 3.41x |
| docs | mixed | 59.1 / 686.7 / 11.62x | 48.8 / 209.4 / 4.29x | 48.9 / 219.4 / 4.49x | 79.0 / 213.7 / 2.70x |
| Scenario | Profile | Cold | Warm | Edit | Shape |
|---|---|---|---|---|---|
| codebase | io | 435.4 / 5730.3 / 13.16x | 197.7 / 1878.2 / 9.50x | 308.8 / 1925.4 / 6.24x | 318.1 / 1885.5 / 5.93x |
| codebase | cpu | 193.0 / 10173.7 / 52.71x | 129.1 / 798.4 / 6.18x | 142.7 / 837.1 / 5.87x | 165.6 / 871.6 / 5.26x |
| codebase | mixed | 127.9 / 2161.2 / 16.89x | 117.7 / 674.8 / 5.74x | 129.3 / 702.9 / 5.44x | 137.7 / 693.8 / 5.04x |
| docs | io | 305.5 / 6109.2 / 20.00x | 159.5 / 1727.5 / 10.83x | 221.2 / 1718.6 / 7.77x | 195.3 / 1708.5 / 8.75x |
| docs | cpu | 189.3 / 11076.1 / 58.50x | 108.9 / 1164.0 / 10.69x | 82.2 / 934.1 / 11.37x | 110.9 / 990.8 / 8.93x |
| docs | mixed | 127.6 / 2438.3 / 19.11x | 98.4 / 653.8 / 6.64x | 95.7 / 802.9 / 8.39x | 116.0 / 723.2 / 6.24x |
| Scenario | Profile | Cold | Warm | Edit | Shape |
|---|---|---|---|---|---|
| codebase | io | 3566.3 / 62710.1 / 17.58x | 2111.0 / 19738.8 / 9.35x | 1624.9 / 17727.2 / 10.91x | 2070.2 / 15623.2 / 7.55x |
| codebase | cpu | 1099.3 / 89710.7 / 81.61x | 670.4 / 6798.1 / 10.14x | 669.9 / 6912.0 / 10.32x | 679.2 / 7081.1 / 10.42x |
| codebase | mixed | 579.9 / 18764.9 / 32.36x | 567.6 / 5843.3 / 10.29x | 667.4 / 5843.7 / 8.76x | 642.1 / 5788.8 / 9.02x |
| docs | io | 2498.9 / 58518.0 / 23.42x | 1781.3 / 15456.5 / 8.68x | 1977.7 / 15362.5 / 7.77x | 1981.5 / 15480.9 / 7.81x |
| docs | cpu | 1361.1 / 100299.5 / 73.69x | 698.1 / 6749.0 / 9.67x | 640.2 / 6433.4 / 10.05x | 837.0 / 6495.4 / 7.76x |
| docs | mixed | 678.1 / 20372.6 / 30.04x | 662.1 / 5451.0 / 8.23x | 675.9 / 5507.9 / 8.15x | 604.1 / 5677.6 / 9.40x |
Shape is the mutation-heaviest phase. Tracking the ratio across scales shows the Rust advantage compounds as the dataset grows:
| Profile | tiny | medium | large | xlarge |
|---|---|---|---|---|
| io | 1.30x | 4.79x | 5.93x | 7.55x |
| cpu | 1.24x | 3.08x | 5.26x | 10.42x |
| mixed | 0.85x | 2.54x | 5.04x | 9.02x |
| Profile | tiny | medium | large | xlarge |
|---|---|---|---|---|
| io | 1.27x | 5.01x | 8.75x | 7.81x |
| cpu | 1.15x | 3.58x | 8.93x | 7.76x |
| mixed | 0.83x | 2.68x | 6.24x | 9.40x |
| Scale | Use for |
|---|---|
tiny | CI smoke test only — fixed overhead noise |
medium | first scale where numbers are meaningful |
large | full-matrix production runs |
xlarge | headline / press-release numbers |
runner.py overwrites rust_metrics.json and python_metrics.json on every
phase. That means the on-disk JSON for any given trial only keeps the final
(shape) phase. The per-phase tables above come from what was captured
in-memory during the runs that produced this report; they are reproducible,
just not all present as files.
If you need per-phase persistence, pipe the runner output with --format json into your own store instead.