tools/llm-sequential-upgrade/perf-benchmark/README.md
Runtime performance harness for the Level 12 chat apps the LLM built in the sequential upgrade benchmark. Measures messages-per-second throughput and latency so we have showcase numbers for the marketing one-pager.
This is not a synthetic benchmark of PostgreSQL vs SpacetimeDB. It's a benchmark of the apps the LLM built on each stack, run as-is.
| Scenario | What it measures |
|---|---|
stress | N writers flooding send_message for D seconds. Sustained msgs/sec + p99 latency. |
realistic | M users at human cadence (5-15s jitter) for D seconds. Sustained msgs/sec + latency under realistic load. |
npm install
# Generate SpacetimeDB bindings against the target Level 12 app's backend.
# Re-run this if you change which app you're benchmarking.
spacetime generate --lang typescript --out-dir src/module_bindings \
--module-path ../sequential-upgrade/sequential-upgrade-20260406/spacetime/results/chat-app-20260406-153727/backend/spacetimedb
The target apps must already be running:
cd <pg-app>/server && npm run dev (Express on :6001),
plus the exhaust-test-postgres-1 Docker container (port 6432).spacetime start running, and the target module
must be published (the apps publish themselves automatically when generated).# PG stress, 30s, 20 writers
npm run run -- --backend pg --scenario stress --writers 20 --duration 30
# STDB stress, 30s, 50 writers
npm run run -- --backend stdb --scenario stress --writers 50 --duration 30 \
--module chat-app-20260406-153727
# Both throughput scenarios for one backend
npm run run -- --backend pg --scenario all
npm run run -- --backend stdb --scenario all --module chat-app-20260406-153727
Results land in results/<timestamp>/<backend>-<scenario>.json.
Saved optimized-reference snapshots also live under
results/optimized-reference/.
Tracked reference implementations and methodology live in
optimized-reference/.
send_message handler enforces a 500ms-per-user rate limit
in application code. Each PG writer can therefore issue at most ~2 msgs/sec.
Throughput scales with writers, not with cadence. The harness paces writers
at ~510ms to avoid drops. SpacetimeDB has no equivalent limit, so its
per-writer ceiling is much higher.