Back to Cocoindex

Benchmark Report — Rust SDK vs Python SDK

examples/benchmark_py_vs_rs/BENCHMARK_REPORT.md

1.0.49.2 KB
Original Source

Benchmark Report — Rust SDK vs Python SDK

Results captured from .work/. Nothing was rerun to produce this report.

See README.md if you don't know what the pipeline or the four phases are — this file just reports numbers.

The one-line answer

At 10,000+ input files:

  • Rust SDK cold run is 30–80× faster than the Python SDK, depending on profile.
  • Rust SDK warm run is 8–10× faster.
  • Incremental edits stay small in both — the incremental contract isn't the thing that separates them; raw per-call overhead is.

Speedup at a glance — codebase / cold phase

mermaid
xychart-beta
    title "Cold speedup (py_ms / rust_ms) — codebase scenario"
    x-axis ["tiny", "medium", "large", "xlarge"]
    y-axis "speedup (x)" 0 --> 90
    bar [2.22, 10.54, 16.89, 32.36]
    bar [5.75, 30.04, 52.71, 81.61]
    bar [3.13, 11.70, 13.16, 17.58]

Bars are mixed, cpu, io in that order. The docs scenario has the same shape — bigger scale, bigger gap.

Peak wins (xlarge, cold phase):

ScenarioProfileSpeedup
codebasecpu81.61×
docscpu73.69×
codebasemixed32.36×
docsmixed30.04×
docsio23.42×
codebaseio17.58×

Did the incremental contract hold?

Yes, at every scale:

  • warm phases had cache_misses = 0 (every section hit the memo cache).
  • shape phases invalidated only 9–15 sections and rewrote only 4–11 output files, at every scale from tiny to xlarge. Adding/renaming one file doesn't cascade into a near-full rebuild.

Where the numbers come from (the matrix)

AxisValues
scalestiny, medium, large, xlarge
scenarioscodebase, docs
profilesio, cpu, mixed
phasescold, warm, edit, shape

Input dataset sizes

Raw file counts generated by common.py. mixed and cpu share the same counts; io doubles the fanout.

Scaleall mixed/cpuall iocodebase mixed/cpucodebase iodocs mixed/cpudocs io
tiny5110227542448
medium368736176352192384
large1280256070414085761152
xlarge1075221504563211264512010240

Full per-phase tables

Each cell is rust_ms / py_ms / ratio. Medians across trials. Bigger ratio is a bigger Rust win.

Tiny — noisy, treat as smoke test

ScenarioProfileColdWarmEditShape
codebaseio68.7 / 215.4 / 3.13x47.2 / 99.7 / 2.11x45.0 / 106.5 / 2.37x77.6 / 142.1 / 1.83x
codebasecpu58.0 / 333.7 / 5.75x40.7 / 54.8 / 1.34x41.8 / 62.8 / 1.50x67.2 / 84.0 / 1.25x
codebasemixed46.2 / 102.5 / 2.22x40.7 / 50.9 / 1.25x44.3 / 55.1 / 1.24x70.4 / 57.9 / 0.82x
docsio55.0 / 217.3 / 3.95x44.5 / 86.5 / 1.94x45.2 / 148.7 / 3.29x69.6 / 99.0 / 1.42x
docscpu50.0 / 319.3 / 6.38x42.2 / 52.7 / 1.25x43.4 / 60.1 / 1.38x66.0 / 73.8 / 1.12x
docsmixed46.0 / 91.6 / 1.99x41.9 / 50.7 / 1.21x42.5 / 54.0 / 1.27x68.4 / 55.6 / 0.81x

At tiny, fixed overhead (process start, LMDB open) dominates. The one row where Python looks faster is shape / mixed — expected at this scale.

Medium — first scale where the gap is consistent

ScenarioProfileColdWarmEditShape
codebaseio123.8 / 1448.0 / 11.70x76.7 / 504.0 / 6.57x83.8 / 519.4 / 6.20x104.7 / 502.0 / 4.79x
codebasecpu81.0 / 2435.1 / 30.04x50.8 / 219.6 / 4.32x50.9 / 261.4 / 5.13x74.0 / 241.5 / 3.26x
codebasemixed54.1 / 569.7 / 10.54x50.1 / 191.8 / 3.83x55.2 / 209.0 / 3.79x71.8 / 206.1 / 2.87x
docsio125.1 / 1754.9 / 14.03x70.5 / 528.2 / 7.49x74.9 / 550.6 / 7.35x109.3 / 532.9 / 4.88x
docscpu85.1 / 2995.8 / 35.18x51.9 / 233.4 / 4.49x53.2 / 248.4 / 4.67x76.1 / 259.5 / 3.41x
docsmixed59.1 / 686.7 / 11.62x48.8 / 209.4 / 4.29x48.9 / 219.4 / 4.49x79.0 / 213.7 / 2.70x

Large — full matrix separates cleanly

ScenarioProfileColdWarmEditShape
codebaseio435.4 / 5730.3 / 13.16x197.7 / 1878.2 / 9.50x308.8 / 1925.4 / 6.24x318.1 / 1885.5 / 5.93x
codebasecpu193.0 / 10173.7 / 52.71x129.1 / 798.4 / 6.18x142.7 / 837.1 / 5.87x165.6 / 871.6 / 5.26x
codebasemixed127.9 / 2161.2 / 16.89x117.7 / 674.8 / 5.74x129.3 / 702.9 / 5.44x137.7 / 693.8 / 5.04x
docsio305.5 / 6109.2 / 20.00x159.5 / 1727.5 / 10.83x221.2 / 1718.6 / 7.77x195.3 / 1708.5 / 8.75x
docscpu189.3 / 11076.1 / 58.50x108.9 / 1164.0 / 10.69x82.2 / 934.1 / 11.37x110.9 / 990.8 / 8.93x
docsmixed127.6 / 2438.3 / 19.11x98.4 / 653.8 / 6.64x95.7 / 802.9 / 8.39x116.0 / 723.2 / 6.24x

XLarge — the headline run (10k+ files)

ScenarioProfileColdWarmEditShape
codebaseio3566.3 / 62710.1 / 17.58x2111.0 / 19738.8 / 9.35x1624.9 / 17727.2 / 10.91x2070.2 / 15623.2 / 7.55x
codebasecpu1099.3 / 89710.7 / 81.61x670.4 / 6798.1 / 10.14x669.9 / 6912.0 / 10.32x679.2 / 7081.1 / 10.42x
codebasemixed579.9 / 18764.9 / 32.36x567.6 / 5843.3 / 10.29x667.4 / 5843.7 / 8.76x642.1 / 5788.8 / 9.02x
docsio2498.9 / 58518.0 / 23.42x1781.3 / 15456.5 / 8.68x1977.7 / 15362.5 / 7.77x1981.5 / 15480.9 / 7.81x
docscpu1361.1 / 100299.5 / 73.69x698.1 / 6749.0 / 9.67x640.2 / 6433.4 / 10.05x837.0 / 6495.4 / 7.76x
docsmixed678.1 / 20372.6 / 30.04x662.1 / 5451.0 / 8.23x675.9 / 5507.9 / 8.15x604.1 / 5677.6 / 9.40x

Shape-phase trend

Shape is the mutation-heaviest phase. Tracking the ratio across scales shows the Rust advantage compounds as the dataset grows:

Codebase

Profiletinymediumlargexlarge
io1.30x4.79x5.93x7.55x
cpu1.24x3.08x5.26x10.42x
mixed0.85x2.54x5.04x9.02x

Docs

Profiletinymediumlargexlarge
io1.27x5.01x8.75x7.81x
cpu1.15x3.58x8.93x7.76x
mixed0.83x2.68x6.24x9.40x

How to read these numbers

  • Cold is the raw build-from-scratch cost. Big ratios here = Rust handles the fanout and per-section work more efficiently.
  • Warm is the cached-everything rerun. Python still pays more per cache-hit than Rust does, so the gap is still there — just narrower than cold.
  • Edit / Shape prove the incremental contract: cache misses and rebuilt outputs stay small and bounded. A regression here would show up as a near-cold-sized wall time.

When to use which scale

ScaleUse for
tinyCI smoke test only — fixed overhead noise
mediumfirst scale where numbers are meaningful
largefull-matrix production runs
xlargeheadline / press-release numbers

Footnote: artifact caveat

runner.py overwrites rust_metrics.json and python_metrics.json on every phase. That means the on-disk JSON for any given trial only keeps the final (shape) phase. The per-phase tables above come from what was captured in-memory during the runs that produced this report; they are reproducible, just not all present as files.

If you need per-phase persistence, pipe the runner output with --format json into your own store instead.