tools/baml-bench/docs/reference.md
Consolidated index of every function, method, and class (private,
underscore-prefixed helpers included), with a one-line summary. Generated
from the in-code docstrings / JSDoc (run
python docs/_gen_reference.py to refresh). Full parameter/return detail
lives in the docstrings themselves.
libs/bench_core (Python)libs/bench_core/cursor_client.pylaunch_agent(api_key, prompt_text, repo_url, ref, auto_create_pr, model, timeout) - Launch a Cursor cloud agent to work a fix on a GitHub repo.libs/bench_core/jsonl.py_scan(s, start) - Find the brace-balanced object that opens at a given index.extract_first_json_object(s) - Parse and return the first top-level JSON object found in a string.extract_last_json_object(s) - Parse and return the last top-level JSON object found in a string.libs/bench_core/notion_client.py_chunks(text, size) - Split text into chunks each no larger than a size limit._paragraph(text) - Build a Notion paragraph block wrapping the given text.class NotionClient - Minimal Notion REST client for creating and updating issue pages.
__init__(token) - Build the auth, version, and content-type headers for Notion requests.create_issue_page(database_id, title, status_name, body, evidence_links, suggestion, category) - Create a Notion issue page with a title, status, and chunked body.set_status(page_id, status_name) - Update the Status select property on an existing issue page.libs/bench_core/prices.py_load() - Lazily load and memoize the model rate table from prices.toml.prices_for(model) - Look up the rate card for a model.compute_cost(input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, prices) - Compute the USD cost of a token breakdown at the given per-million rates.libs/bench_core/processor.pyclass Processor - Event-driven claim loop: a subclass declares its queue and implements process().
__init__(service) - Bind the service client and mint a unique worker id for this process.process(item) - Handle one claimed item; subclasses implement the stage's work._claim_one() - Claim a single item using the subclass's queue config._drain() - Claim and run items until the queue is empty, or just one in batch mode._run_one(item) - Run process() for one item under a heartbeat task, failing it on error._heartbeat(item_id) - Periodically extend the item's lease until the task is cancelled._poll_backstop() - Periodically drain the queue to backstop dropped SSE wake-ups.run() - Run the main loop: drain on startup, then drain on each SSE wake-up.run_processor(proc_factory) - Run a processor to completion from a synchronous entry point.libs/bench_core/proxy_client.pyclass ProxyClient - Load-spreading client for a pool of stateless claude-proxy instances.
__init__(urls, token) - Initialize the client from a pool of proxy URLs and a bearer token.from_env() - Build a client from the CLAUDE_PROXY_URLS and CLAUDE_PROXY_TOKEN env vars.run_agent(req, timeout) - Run an agent by POSTing /run-agent to a randomly chosen proxy.check_baml(req, timeout) - Check a baml repro by POSTing /check-baml to a randomly chosen proxy.libs/bench_core/schemas.pyclass Prices - Per-million-token USD rate card for a single model.class RunAgentRequest - Request to spawn a claude agent against a cell on the version-cached baml CLI.class CheckBamlRequest - Request to run a baml command against a minimal repro on the version-cached CLI.class CheckBamlResult - Outcome of a CheckBamlRequest: exit code, timeout flag, and output tails.class AgentResult - Result of a run-agent invocation: status, token/cost metrics, and posted files.class Metrics - Aggregated trophy metric bag rolled up across a run's invocations.class EvidenceAnchor - Pointer back to the trophy and transcript location that a finding cites.class Finding - A single skill or language issue the worker agent surfaced in a run, with its transcript anchor.libs/bench_core/service_client.pyclass ServiceClient - Async HTTP client for the baml-bench service's CRUD, queue, and blob endpoints.
__init__(base_url, token, timeout) - Open a bearer-authenticated httpx client against the service base URL.aclose() - Close the underlying httpx client and release its connections.create(table, doc) - Insert a document into a table via the service's POST /{table} endpoint.get(table, id) - Fetch a single document by id.list(table, **query) - List documents in a table, filtered by the supplied query params.update(table, id, patch) - Apply a partial update to a document.remove(table, id) - Delete a document by id.claim(table, worker_id, lease_ms, value, claimed_value, field, index) - Atomically claim one queued document, flipping its field and stamping a lease.transition(table, id, to, field, patch, release_claim) - Transition a claimed document's field to a new value and release its lease.heartbeat(table, id, lease_ms) - Extend a claimed document's lease so a long-running job keeps its claim.events(table, value, field, index) - Stream the claimable-document count over SSE, yielding on each change.put_transcript(table, id, text) - Upload a transcript blob for a document and return its storage id.get_transcript(storage_id) - Fetch the text of a previously uploaded transcript blob.baml_current() - Fetch the currently pinned baml build.baml_update() - Trigger the service to refresh the pinned baml build.put_baml_binary(build_id, data) - Upload the compiled baml CLI binary for a build.libs/bench_core/slack_client.pypost_message(token, channel, text, thread_ts, blocks) - Post a message to a Slack channel via chat.postMessage.verify_signature(signing_secret, timestamp, body, signature, max_skew) - Verify a Slack request signature using the v0 HMAC-SHA256 scheme.services (Python)services/api/__main__.pyservices/api/app.pyrequire_bearer(authorization) - Enforce bearer-token auth on a request.create_app() - Build and return the configured FastAPI application.services/api/blobs.py_path(storage_id) - Resolve a storage id to an absolute path inside the blob directory.put_text(kind, key, text) - Write a text blob to the volume and return its storage id.get_text(storage_id) - Read and return a stored text blob.put_binary(kind, key, data) - Write a binary blob to the volume and return its identifying metadata.get_binary(storage_id) - Read and return a stored binary blob.exists(storage_id) - Report whether a blob exists for the given storage id.services/api/convex_gateway.pyclass ConvexGateway - Async client for Convex's HTTP function API (query/mutation/action).
__init__(url, admin_key, poll_interval) - Initialize the gateway and its underlying HTTP client._run(kind, name, args) - Invoke a Convex function over the HTTP API and return its value.query(name, args) - Run a Convex query function.mutation(name, args) - Run a Convex mutation function.action(name, args) - Run a Convex action function.subscribe_counts(name, args) - Poll a count query and yield whenever the count changes.gateway_from_env() - Construct a ConvexGateway from environment variables.services/api/routers/baml_builds.py_gh_headers(accept) - Build GitHub API request headers, adding auth when available._resolve_alpha(slug) - Resolve the latest baml-language alpha pre-release from GitHub.make_baml_router(convex) - Build the baml version registry and build-coordination router.services/api/routers/table.pyclass ClaimBody - Request body for claiming the next item off a table's queue.class TransitionBody - Request body for transitioning an item to a new status.class HeartbeatBody - Request body for extending a claimed item's lease.make_router(table, convex) - Build a CRUD + queue + SSE router for a single Convex table.services/baml_builder/__main__.pyclass BamlBuilder - Processor that fetches a queued BAML alpha build and marks it ready.
process(item) - Download the alpha release binary for a claimed build and store it.services/baml_builder/build.py_gh_headers() - Build the GitHub API request headers, adding auth when a token is set._platform_triple() - Return the glibc Linux target triple for this machine's architecture._extract_baml(targz) - Extract the baml binary from a gzipped release tarball.fetch_baml(tag) - Download the alpha release tag asset for this platform; return the baml binary bytes.services/baml_dedup/__main__.pyclass BamlDedup - Batch queued trophies through the classify/dedup agent and upsert the issues DB.
__init__(service) - Initialize the processor and build a proxy client from the environment.process(first) - Dedup a window of queued trophies and upsert the resulting issues._open_issues() - Fetch the currently-open issues to give the agent dedup context._parse_issues(result) - Extract the issue list the agent produced from its run result._upsert(it) - Create a new issue or merge evidence into an existing one._merge_evidence(old, new) - Append new evidence entries, skipping ones already present.services/baml_dedup/prompts.pyservices/baml_dedup/render.pyrender_reports_md(trophies) - Render a batch of trophies into the reports.md document fed to the agent.render_open_issues_json(issues) - Render open issues into the open_issues.json document fed to the agent.services/baml_worker/__main__.py_load_skill() - Load and cache the BAML skill text injected into the agent's workspace._derive_outcome(agent_status, task_completed) - Map the agent's run status and self-report into a trophy outcome._mine_baml_errors(turn_log) - Mine baml-attributable tool errors from the turn log as tentative findings.class BamlWorker - Claim a task, run the BAML agent, verify its repros, and create a trophy.
__init__(service) - Bind the service client and build a proxy client from the environment.process(item) - Run one task end to end and persist its trophy._verify_repros(findings, baml_version, task_id) - Verify each finding's repro through the proxy's baml, in place._parse_trophy_json(result) - Parse the agent's self-reported trophy.json from the run result._render_report_md(analysis, metrics, outcome) - Render a fallback markdown report when the agent supplied none._notify(item, outcome, summary, metrics, findings, trophy_id) - Post the run result to Slack, linking the trophy on the dashboard.services/baml_worker/prompts.pyservices/claude_proxy/__main__.pyservices/claude_proxy/app.py_safe_staging(prefix, raw, field) - Resolve a staging directory for a caller-supplied id, rejecting traversal._get_api_key() - Return the Anthropic API key._require_bearer(authorization) - Reject the request unless its bearer token matches the proxy token._ensure_baml(sha) - Return the dir containing the baml binary for sha, caching on miss.healthz() - Liveness probe that always reports the service is up.run_agent(req, authorization) - Run a Claude Code agent over staged files and return transcript + metrics.check_baml(req, authorization) - Compile/run a minimal repro with the version-cached baml on PATH (no claude). Used by the worker to verify a finding's repro reproduces.services/claude_proxy/runner.pyvalidate_relative_path(rel) - Reject paths that are empty, absolute, or contain parent traversal.materialize_files(staging, files) - Write each file's content into the staging directory, creating parents.spawn_claude(claude_bin, cwd, prompt, model, max_turns, system_prompt, baml_bin_dir, timeout_secs, anthropic_api_key) - Run claude -p - and return (stdout, stderr, exit_code).run_command(cwd, command, baml_bin_dir, timeout_secs) - Run a shell command (e.g. baml build) in cwd with the version-cached baml on PATH. Returns (stdout, stderr, exit_code, timed_out). exit_code -9 signals a wall-clock timeout.parse_claude_session(stdout) - Extract the final JSON summary line of claude -p --output-format json.session_log_path(staging, session_id) - Compute the path to claude's session jsonl log for a staging dir._preview(s, is_error) - Truncate a string to a preview, keeping head and tail for errors._result_text(c) - Extract the text payload from a tool_result content block.parse_turn_log(jsonl) - Parse claude's session jsonl into per-assistant-turn structured rows.compute_cost(session, prices) - Compute the USD cost of a session from its token counts and prices.host_metadata() - Capture OS, architecture, timestamp, and optional hostname of the host.collect_post_files(staging, patterns, max_file_bytes, max_total_bytes) - Collect text files under staging matching glob patterns within size caps.services/cron/__main__.py_tasks() - Return the prompt(s) to enqueue this cycle._cycle(service) - Run one cron cycle: refresh the baml build, then enqueue the day's task(s)._amain() - Run the cron loop: one cycle on start, then every INTERVAL seconds.services/ingress/__main__.pyservices/ingress/app.py_is_duplicate(event_id) - Record an event id and report whether it was already seen.healthz() - Liveness probe._create_slack_task(event, text, eid) - Create a Slack-sourced task off the request path.slack_events(request, background_tasks, x_slack_signature, x_slack_request_timestamp, x_slack_retry_num) - Handle the Slack Events API callback (URL verification + app mentions).notion_webhook(request, x_notion_signature) - Approve the issue a Notion webhook points at (the fix dispatcher claims it)._toggle_uuid_hyphens(value) - Return the alternate hyphenation of a Notion id.bug_trigger(payload) - Create a task from a bug report.services/notion_fixer/__main__.pyclass NotionPush - Claim loop that mirrors dirty issues onto the Notion board.
__init__(service) - Build the processor and its Notion client._db_for(kind) - Return the Notion database id for an issue kind.process(issue) - Sync one claimed issue to Notion and mark it synced._confirm(issue_id, issue) - Promote a just-boarded open issue to confirmed._map_status(status) - Map an internal issue status to its Notion board status label.class FixDispatch - Claim loop that dispatches approved issues to Cursor for a fix.
__init__(service) - Build the processor and its Notion client.process(issue) - Dispatch a fix for one claimed approved issue._amain() - Run the NotionPush and FixDispatch claim loops together until cancelled.services/notion_fixer/fixer.pychoose_repo(kind) - Return the GitHub org/repo that owns issues of this kind.repo_url(kind) - Return the full GitHub URL for the repo that owns this issue kind._pr_instructions() - Return the boilerplate telling the agent how to write the pull request.cursor_prompt(issue) - Build the instruction text for a Cursor cloud agent.evidence_links(issue) - Build dashboard URLs for each trophy cited as evidence on an issue.convex (TypeScript)convex/bamlBuilds.tsget - Fetch one baml build by id.list - List baml builds newest-first, optionally filtered by an index field/value.countClaimable - Count baml builds in a claimable state for queue-depth gauges.create - Insert a new baml build row.update - Patch fields on a baml build.remove - Delete a baml build.claim - Atomically claim the oldest queued baml build for a worker.transition - Move a baml build to a new status and release its claim.heartbeat - Extend a claimed baml build's lease so a live worker isn't reaped.convex/issues.tsget - Fetch one issue by id.list - List issues newest-first, optionally filtered by an index field/value.countClaimable - Count issues in a claimable state for queue-depth gauges.create - Insert a new issue row.update - Patch fields on an issue.remove - Delete an issue.claim - Atomically claim the oldest queued issue for a worker.transition - Move an issue to a new status and release its claim.heartbeat - Extend a claimed issue's lease so a live worker isn't reaped.convex/lib.tsgetDoc - Fetch a single row by id.listDocs - List rows newest-first, optionally filtered to a single index field/value.countClaimable - Count rows currently in a claimable state for queue-depth gauges.createDoc - Insert a row with default attempts and timestamps.updateDoc - Patch a row and bump its updatedAt timestamp.removeDoc - Delete a row by id.claimDoc - Atomically claim the oldest claimable row for a table.transitionDoc - Move a row to a new status and, unless told otherwise, release its claim.heartbeatDoc - Extend a claimed row's lease so a live worker isn't reaped.convex/maintenance.tsreap - Cron entry point that sweeps every queue rule for expired leases.reapNow - Public wrapper around the reaper for ops/testing ("force a reap now").convex/tasks.tsget - Fetch one task by id.list - List tasks newest-first, optionally filtered by an index field/value.countClaimable - Count tasks in a claimable state for queue-depth gauges.create - Insert a new task row.update - Patch fields on a task.remove - Delete a task.claim - Atomically claim the oldest queued task for a worker.transition - Move a task to a new status and release its claim.heartbeat - Extend a claimed task's lease so a live worker isn't reaped.convex/trophies.tsget - Fetch one trophy by id.list - List trophies newest-first, optionally filtered by an index field/value.countClaimable - Count trophies in a claimable state for queue-depth gauges.create - Insert a new trophy row.update - Patch fields on a trophy.remove - Delete a trophy.claim - Atomically claim the oldest queued trophy for a worker.transition - Move a trophy to a new status and release its claim.heartbeat - Extend a claimed trophy's lease so a live worker isn't reaped.ui (TypeScript)ui/app/api/state/route.tsGET - GET /api/state - returns the current live snapshot as JSON for the polling dashboard.ui/app/db/[table]/db-view.tsxDbView - Client component rendering a live table view of tasks, trophies, or issues withui/app/db/[table]/page.tsxPage - Server component for the "/db/[table]" route. Validates the table slug againstui/app/graph-view.tsxGraphView - Client component rendering the interactive Cytoscape pipeline graph. Nodes linkui/app/layout.tsxRootLayout - Root server-component layout wrapping every page in the html/body shell andui/app/lib/data.tsTurn - One Claude Code transcript turn (a single API call) in a trophy's turn log.Metrics - Aggregate run metrics (turns, tokens, cost, wall clock) recorded on a trophy.Finding - A single skill- or language-level finding surfaced by a run, optionally anchored to a transcript call.Trophy - A completed run record: its outcome, metrics, report, findings, and full turn log.Task - A benchmark task queued for an agent to attempt (from Slack, cron, etc.).Issue - A deduplicated issue aggregated from run findings, tracked through to Notion/Cursor.issueStatusLabel - Maps an issue's raw status to the label shown in the UI.RunRow - A flattened run row joining a trophy to its task, sized for the dashboard runs table.DashboardData - The full payload rendered by the static dashboard: runs, open issues, and headline totals.loadDashboardData - Loads and reshapes trophies, tasks, and issues into the static dashboard payload.Build - A BAML canary build row from the bamlBuilds registry (sha, channel ref, status).Inflight - A unit of work an agent is actively processing right now (task, trophy, issue, or build).TaskRow - A task row enriched for the live view, including its claiming worker and resolved report id.LiveState - The full live-polled snapshot driving the graph, db tables, and live dashboard.loadState - Loads the live snapshot: status tallies, in-flight work, runs, issues, builds, and agent counts.RunDetail - A run-detail bundle for the run page: the trophy, its task, and a readable baml version.loadRun - Loads a single run's detail: its trophy, the originating task, and a readable baml label.loadTask - Loads a task plus the id of any trophy it has produced (used to redirect to the result).ui/app/lib/format.tsago - Formats an elapsed duration in milliseconds as a compact relative age.ui/app/live-dashboard.tsxusePolledState - Hook that polls /api/state every 3s and exposes the latest live snapshot,LiveDashboard - Client component rendering the live dashboard: the tasks -> trophies -> issuesui/app/page.tsxPage - Server component for the dashboard home route ("/"). Loads the initial liveui/app/runs/[id]/call-scroller.tsxCallScroller - Client component that reads ?call=N from the URL and, on mount, opens the matchingui/app/runs/[id]/expand-all.tsxExpandAll - Client component button that opens or closes every <details> in the transcriptui/app/runs/[id]/page.tsxRunPage - Server component for the "/runs/[id]" route rendering a trophy's full detail:ui/app/tasks/[id]/page.tsxTaskPage - Server component for the "/tasks/[id]" route. Loads the task; 404s if not found,