tools/llm-oneshot/README.md
This project benchmarks how well Cursor rules enable AI to one-shot SpacetimeDB apps — generate and deploy a working app in a single attempt.
This benchmark compares AI-generated apps across two platforms:
By generating equivalent apps for both platforms, we can evaluate how well Cursor rules guide the AI to produce working SpacetimeDB applications compared to a familiar baseline (PostgreSQL).
Open this folder as a workspace in Cursor
tools/llm-oneshot.cursor/rules/ filesOpen a new Agent chat
Ctrl+I (Windows/Linux) or Cmd+I (Mac) to open the AI panelSelect your model
Add the prompt files
Drag these two files from the file explorer directly into the chat:
apps/chat-app/prompts/language/typescript-spacetime.md (or your desired stack)apps/chat-app/prompts/composed/12_full.md (or your desired feature level)Then type this message:
Read all rules first. Do not reference AI-generated apps in apps/ for guidance.
Execute these prompts.
Let the AI generate the app
Deploy when prompted
Why isolate from existing apps? To ensure clean results. If the AI references previous attempts, we can't tell whether success came from the rules or from copying.
TypeScript + SpacetimeDB (full features):
apps/chat-app/prompts/language/typescript-spacetime.mdapps/chat-app/prompts/composed/12_full.mdTypeScript + PostgreSQL (full features):
apps/chat-app/prompts/language/typescript-postgres.mdapps/chat-app/prompts/composed/12_full.md| Language File | Stack |
|---|---|
typescript-spacetime.md | TypeScript + SpacetimeDB (React) |
typescript-postgres.md | TypeScript + PostgreSQL (Express) |
Each level is cumulative.
| Level | Features Added |
|---|---|
| 01 | Basic Chat, Typing, Read Receipts, Unread |
| 02 | + Scheduled Messages |
| 03 | + Ephemeral Messages |
| 04 | + Reactions |
| 05 | + Edit History |
| 06 | + Permissions |
| 07 | + Presence |
| 08 | + Threading |
| 09 | + Private Rooms |
| 10 | + Activity Indicators |
| 11 | + Draft Sync |
| 12 | + Anonymous Migration (ALL) |
The AI will ask (per deployment.mdc rules):
GRADING_RESULTS.md fileGrading is done manually, with AI doing a shallow pass before manual review. The grading rubric is in apps/{app}/prompts/grading_rubric.md.
Each graded app gets a GRADING_RESULTS.md file in its folder.
To generate summary reports from all graded apps:
cd tools/llm-oneshot
pnpm install
pnpm run summarize
This outputs to docs/llms/:
oneshot-summary.md — Combined summary with feature scoresoneshot-grades.json — Structured data for websitesGenerated apps are stored in:
apps/{app-name}/{language}/{model}/{platform}/{app-name}-{YYYYMMDD-HHMMSS}/
Example:
apps/chat-app/typescript/opus-4-5/spacetime/chat-app-20260107-120000/
apps/chat-app/typescript/opus-4-5/postgres/chat-app-20260108-140000/
This structure allows comparing results across: