Back to Spacetimedb

LLM One-Shot Benchmark Summary

docs/llms/oneshot-summary.md

2.1.02.3 KB
Original Source

LLM One-Shot Benchmark Summary

Generated: 2026-01-30 Total Runs: 13

Overall Results by Backend

BackendRunsAvg ScoreBestWorst
SpacetimeDB791.7%100.0%76.0%
PostgreSQL662.7%76.4%41.7%

SpacetimeDB advantage: +29.0 percentage points

Results by LLM

LLMSTDB RunsSTDB AvgPG RunsPG AvgDelta
gemini-3-pro185.4%171.9%+13.5%
gpt-5-21100.0%141.7%+58.3%
grok-code176.0%146.9%+29.1%
opus-4-5495.1%372.0%+23.1%

Feature Scores (Average)

FeatureMaxSTDB AvgPG AvgWinner
1. Basic Chat Features32.791.71STDB
2. Typing Indicators32.932.50STDB
3. Read Receipts33.001.58STDB
4. Unread Message Counts32.861.33STDB
5. Scheduled Messages32.361.75STDB
6. Ephemeral/Disappearing Messages32.642.67PG
7. Message Reactions32.891.79STDB
8. Message Editing with History32.931.83STDB
9. Real-Time Permissions32.251.58STDB
10. Rich User Presence32.882.67STDB
11. Message Threading33.002.00STDB
12. Private Rooms & Direct Messages33.002.17STDB
13. Room Activity Indicators30.00--
14. Draft Sync30.00--
15. Anonymous to Registered Migration30.00--

All Runs

LLMBackendDateScore%Level
gemini-3-proPG2026-01-0817.25/2471.9%5
gemini-3-proSTDB2026-01-0720.5/2485.4%5
gpt-5-2PG2026-01-0810/2441.7%5
gpt-5-2STDB2026-01-0724/24100.0%5
grok-codePG2026-01-2811.25/2446.9%5
grok-codeSTDB2026-01-0718.25/2476.0%5
opus-4-5PG2026-01-0427.5/3676.4%9
opus-4-5PG2026-01-0427.25/3675.7%9
opus-4-5PG2026-01-0423/3663.9%9
opus-4-5STDB2026-01-0536/36100.0%9
opus-4-5STDB2026-01-0232.5/3690.3%9
opus-4-5STDB2026-01-0234.5/3695.8%9
opus-4-5STDB34/3694.4%9