Back to Baml

Data model

tools/baml-bench/docs/data-model.md

0.223.06.3 KB
Original Source

Data model

The Convex schema lives in convex/schema.ts. Five tables (tasks, trophies, issues, bamlBuilds, workers) plus the append-only taskEvents log. The four queue tables share a common queueFields block; workers is presence/observability only.

Shared queue fields

Spread into tasks, trophies, issues, and bamlBuilds so the generic claimable-queue logic in convex/lib.ts can treat them uniformly:

FieldTypePurpose
statusstringPer-table state machine (the claimable field).
claimedBystring?Worker id holding the current claim.
claimedAtnumber?When the row was claimed.
leaseExpiresAtnumber?Claim expiry; the reaper requeues anything past it.
attemptsnumberClaim count; drives max-attempts → failed.
lastErrorstring?Last failure string.
createdAtnumberInsert time.
updatedAtnumberLast mutation time.

tasks - the task / event DB

A unit of benchmark work: a prompt to run with the canary baml on PATH.

FieldTypeNotes
sourcestringslack | cron | bug_report.
promptstringThe agent prompt.
repostring?Target repo.
refstring?Git ref.
shastring?Git sha.
bamlVersionstring?Sha of the baml CLI the run used.
slackChannelstring?Slack reply routing.
slackThreadTsstring?Slack reply routing.
slackUserstring?Slack reply routing.
notionProposerPageIdstring?Notion page that proposed the task.
transcriptStorageIdstring?Pointer the api resolves to a transcript blob on its own volume.
rawMetricsany?Raw run metrics.
(+ shared queue fields)

Indexes: by_status_created (status, createdAt) · by_source (source) · by_lease (status, leaseExpiresAt).

Lifecycle: queued → running → done (reaped running → queued, or → failed after max attempts).

taskEvents - append-only task audit log

Written automatically by createDoc, claimDoc, and transitionDoc whenever the table is tasks.

FieldTypeNotes
taskIdid("tasks")The task this event belongs to.
eventTypestringThe status / lifecycle event (e.g. created, running, done).
detailsany?Optional payload (e.g. { workerId } on claim).

Index: by_task (taskId).

trophies - the result DB

The verbose self-reported outcome of one task run.

FieldTypeNotes
taskIdid("tasks")The task that produced this trophy.
outcomestringsuccess | partial | failed | quota_skipped.
compileOkboolean?Whether the produced baml compiled.
compileStderrstring?Compiler stderr.
bamlVersionstring?Sha of the baml CLI used.
metricsanyFull ported metric bag (see bench_core.schemas.Metrics).
hostMetadataany?Runner host info.
transcriptStorageIdstring?Transcript blob pointer.
turnLogarray(any)?Per-turn log.
summarystring?Self-reported narrative summary.
whatWentWellarray(string)?Self-reported positives.
whatFailedarray(string)?Self-reported failures.
reportMdstring?Rendered markdown report.
findingsarray(any)?{kind, title, description, anchor, suggestion?, repro?}.
suggestionsarray(any)?{target, suggestion, rationale}.
(+ shared queue fields)

Indexes: by_status_created (status, createdAt) · by_task (taskId) · by_lease (status, leaseExpiresAt).

Lifecycle: queued → deduping → done.

issues - the deduplicated issue DB

A cross-run merged skill/language issue, with two independent queues.

FieldTypeNotes
kindstringskill | language.
categorystring?bug | suggestion.
titlestringIssue title.
descriptionstringIssue description.
suggestionstring?Definitive skill/language fix.
evidencearray(any){trophyId, turnIndex?, callIndex?, note?} anchors.
reprostring?Minimal repro.
notionPageIdstring?Linked Notion page id.
fixSlackTsstring?Dispatch reference for the fix (the Cursor cloud-agent id).
firstSeenAtnumberFirst observation time.
lastSeenAtnumberMost recent observation time.
notionSyncStatusstringdirty | syncing | synced (separate sync queue).
(+ shared queue fields)

Indexes: by_status_created (status, createdAt) · by_kind_status (kind, status) · by_notion_sync (notionSyncStatus, lastSeenAt) · by_notion_page (notionPageId) · by_lease (status, leaseExpiresAt).

Lifecycles:

  • Bug-fix: open → confirmed → approved → fixing → closed | rejected.
  • Notion sync (notionSyncStatus): dirty → syncing → synced.

bamlBuilds - version registry + build queue

One row per attempted build of the baml CLI.

FieldTypeNotes
shastringSource sha being built.
refstringSource ref.
binaryStorageIdstring?Pointer to the uploaded binary blob.
sizeBytesnumber?Binary size.
contentHashstring?Hash of the binary.
buildLogTailstring?Tail of the build log.
builtAtnumber?Completion time.
(+ shared queue fields)

Indexes: by_status_created (status, createdAt) · by_sha (sha) · by_ref_status (ref, status) · by_lease (status, leaseExpiresAt).

Lifecycle: queued → building → ready | failed.

workers - presence / observability

Not a queue. Tracks live processors for the dashboard.

FieldTypeNotes
workerIdstringWorker identity.
rolestringProcessor role (e.g. baml-worker).
statusstringidle | busy.
currentItemIdstring?Id of the item currently being processed.
lastHeartbeatnumberLast presence ping.

Indexes: by_role_status (role, status) · by_worker (workerId).