Ruflo Cost - Ruflo — ContextQMD

Cost tracking commands:

cost report [--period today|week|month] -- Generate a cost report for the specified period.

Recall token usage records from cost-tracking namespace for the period
Compute costs by model using current pricing (haiku/sonnet/opus input/output rates)
Aggregate by agent, task, and model
Show budget utilization percentage if a budget is configured
Display: total cost, breakdown by model, breakdown by agent, budget status

cost breakdown [--by agent|model|task] -- Detailed cost breakdown by dimension.

Recall all usage records from cost-tracking namespace
Group by the specified dimension (agent, model, or task)
For each group: total tokens (input/output/cache), total cost, percentage of total
Sort by cost descending
Display: dimension value, input tokens, output tokens, cache tokens, total cost, share %

cost budget set <amount> -- Set a budget limit in USD (real implementation, persisted to cost-tracking:budget-config).

Run node plugins/ruflo-cost-tracker/scripts/budget.mjs set <amount> to write the config to the cost-tracking namespace
Thresholds default to: info 50% · warning 75% · critical 90% · hard_stop 100%
Report: confirmed amount + namespace key

cost budget get -- Show the current budget config.

Run node plugins/ruflo-cost-tracker/scripts/budget.mjs get
Report: amount, when set, threshold ladder

cost budget check [--period today|week|month|all] -- Compute utilization + alert level (50/75/90/100% ladder).

Run node plugins/ruflo-cost-tracker/scripts/budget.mjs check
Filter by BUDGET_PERIOD=today|week|month|all (default all)
Sum total_cost_usd across all session-* records in cost-tracking
Compute utilization vs. budget; emit 🟢 OK / 🟡 INFO / 🟠 WARNING / 🔴 CRITICAL / 🛑 HARD_STOP
Exit code 1 on HARD_STOP — wrap agent spawns in budget check && spawn ... to fail closed

cost optimize -- Analyze usage and suggest cost optimizations.

Recall recent usage data from cost-tracking namespace
For each agent, analyze: average task complexity, model used, token efficiency
Identify agents using expensive models for low-complexity tasks
Check cache hit rates and suggest caching improvements
Look for redundant agent spawns or duplicate work
Calculate estimated savings for each recommendation
Display: recommendation, current cost, projected cost, savings, impact assessment

cost track -- Auto-capture token usage for the active Claude Code session and persist to the cost-tracking namespace. Run after significant work or at session end so cost report has real data.

Invoke node plugins/ruflo-cost-tracker/scripts/track.mjs (no flags = current cwd's most-recent session)
Print: total cost, per-model and per-tier breakdown, persisted memory key
Sets the cost-tracking namespace record at key session-<sessionId> (consumed by cost-report step 1)

cost outcome <task> <model> <outcome> -- Emit a hooks_model-outcome event so the router learns from applied recommendations. Auto-wired into cost-optimize step 8.

Validates outcome ∈ {success, escalated, failure}
Runs node plugins/ruflo-cost-tracker/scripts/outcome.mjs "<task>" <model> <outcome>
The script wraps npx @claude-flow/cli hooks model-outcome -t ... -m ... -o ... with explicit-argv spawnSync so quoting is safe
Without this, the router doesn't learn from cost-optimize recommendations and the Tier 1 bypass rate doesn't tighten over time

cost summary [--format json|markdown] -- Single-shot programmatic dump of all cost data. Other plugins/scripts can shell out and parse the JSON.

Run node plugins/ruflo-cost-tracker/scripts/summary.mjs --format json
Output: total_cost_usd, sessionCount, byTier, byModel, topSession, budget, federation aggregate
Default --format markdown; JSON contract is stable for programmatic consumers
ADR-0002 considered an MCP-tool form but deferred (requires v3 source change); this is the plugin-local equivalent

cost federation -- Consumer-side wiring for ADR-097 Phase 3 federation_spend events. Aggregates per-peer 1h/24h/7d rolling windows and flags peers exceeding the suspension threshold (default $5/24h).

Run node plugins/ruflo-cost-tracker/scripts/federation.mjs
Optional: FED_FORMAT=json, FED_NAMESPACE=federation-spend, FED_SUSPEND_THRESHOLD_USD=5.0
Reports gracefully when no events present (Phase 3 not yet landed upstream)
Activates automatically when upstream publishes {peerId, taskId, tokensUsed, usdSpent, ts} to the federation-spend namespace

cost export [--prometheus <path>] [--webhook <url>] -- Export cost-tracking telemetry to external observability systems.

--prometheus <path> writes the node_exporter textfile-collector format (gauges + counters with session labels)
--webhook <url> POSTs JSON; auth via EXPORT_WEBHOOK_HEADER='K: V'
No flag → stdout JSON
Metrics emitted: cost_tracker_total_usd, cost_tracker_tier_total_usd{tier=...}, cost_tracker_session_total_usd{session=...}, cost_tracker_session_messages{session=...}, cost_tracker_budget_usd, cost_tracker_budget_utilization

cost conversation -- Per-conversation cost view: list every session in cost-tracking with started-at, message count, top model, total cost. Different lens from cost report (which is per-agent/per-model).

Run node plugins/ruflo-cost-tracker/scripts/conversation.mjs
Optional CONV_FORMAT=json, CONV_LIMIT=N, CONV_NAMESPACE=...
Reports: total across conversations, per-tier rollup, per-session table

cost trend -- Read all docs/benchmarks/runs/*.json and surface drift in the gate metrics — win rate, avg latency, p99, escalation rate, speedup vs LLM. Flags regressions the binary smoke gate misses.

Run node plugins/ruflo-cost-tracker/scripts/trend.mjs
Optional TREND_FORMAT=json for machine-readable output, TREND_LIMIT=N to truncate
Reports: first→last deltas + per-run series + regression flags (win rate drop or ≥1.5× latency rise)

cost projection [--window 7d] [--horizons 7d,30d,90d,365d] [--format table|json] -- Forward-looking spend extrapolation. Predictive counterpart to cost budget check (reactive).

Run node plugins/ruflo-cost-tracker/scripts/projection.mjs
Compute USD-per-day from sessions in the measurement window (default last 7d)
Linear-extrapolate to 7d/30d/90d/365d horizons (configurable via --horizons)
If cost budget set has run: surface "days until 75%/90%/100% consumed" tables
JSON output for dashboards / CI gates (e.g. jq '.budget.exhaustion[2].daysUntilReached < 7' to fail builds when 100% exhaustion is < 1 week away)
Env: PROJECTION_NAMESPACE, PROJECTION_QUIET=1

cost counterfactual [--since 7d] [--baseline always-haiku|always-sonnet|always-opus|all] [--format table|json] -- Multi-baseline counterfactual cost analysis. Comparative counterpart to budget-check (reactive) and projection (predictive): answers "is the routing earning its keep?".

Run node plugins/ruflo-cost-tracker/scripts/counterfactual.mjs
Sum tokens across all sessions in window (default all-time)
For each baseline tier, compute hypothetical cost if every token had run at that tier's pricing
Surface savings $ + % across all three baselines (default --baseline all)
Negative always-haiku savings = over-escalation signal (router picked sonnet/opus when haiku could have done it). Positive always-sonnet quantifies the router's win against the "safe default" baseline.
JSON output for CI gates: jq '.baselines[1].savingsPct > 30' to flag workload shifts where routing isn't saving ≥30% vs sonnet

cost burn [--bucket 1d] [--lookback 14d] [--alert-on-acceleration-pct N] [--format table|json] -- Burn-rate trend over time with optional drift-alert exit code. Trend counterpart to reactive/predictive/comparative: answers "is daily burn accelerating?".

Run node plugins/ruflo-cost-tracker/scripts/burn.mjs
Bin sessions into --bucket duration windows (default 1d) over --lookback (default 14d)
Compute delta: latest bucket vs mean of prior non-empty buckets
With --alert-on-acceleration-pct N: exit 1 when latest exceeds prior mean by N%+. Independent of budget — catches "hot loop burning 10× normal" before budget alarm fires
Distinct from cost trend (which surfaces BENCHMARK drift across docs/benchmarks/runs/*.json); this tracks PRODUCTION spend.
Edge cases: no prior data → alert SKIPPED (no spurious cold-start alerts). --bucket > --lookback → exit 2 (config error).

cost anomaly [--since 7d] [--threshold 3.5] [--alert-on-outliers N] [--format table|json] -- MAD-based outlier detection on session spend. Point-anomaly counterpart to cost-burn's aggregate-trend signal: answers "which specific session is the outlier?".

Run node plugins/ruflo-cost-tracker/scripts/anomaly.mjs
Compute median(total_cost_usd) and MAD = median(|x - median|) over the filtered window
Per-session modified z-score z = 0.6745 × (x - median) / MAD (Iglewicz-Hoaglin 1993)
Flag sessions with |z| > --threshold (default 3.5)
With --alert-on-outliers N: exit 1 when ≥N outliers found
MAD beats mean+sigma because outliers themselves can't inflate it — robust on n=10. Sessions table labels high (over-spending) vs low (crash/drop) direction.
Edge cases: n<3 → "insufficient data" exit 0. MAD=0 (half the sessions share exact spend) → explainer exit 0.

cost session [--session-id <id>] [--top 20] [--since <iso-ts>] [--format table|json] -- Per-message cost breakdown within ONE session. Drill-down companion to cost-anomaly.

Run node plugins/ruflo-cost-tracker/scripts/session.mjs
Resolves session jsonl via --session-id (scans ~/.claude/projects/*/) or --latest (default)
Lists top-N most expensive messages with full token breakdown (input / output / cache_write / cache_read)
Surfaces p50/p90/p99 message-cost percentiles so operators can judge "is this top message a 2× or 380× outlier?"
Flags the top message as in-session outlier when cost > 2× the p99
The Cache W column is critical: a 569-token output message at $16 looks insane until you see "881898 cache writes" beside it.

cost diff --baseline <path> --current <path> [--alert-on-pct N] [--alert-on-usd N] [--alert-on-class-pct <class>:N[,<class>:N]] [--format table|json] -- Snapshot delta between two cost-summary JSON outputs. PR-level regression detection.

Run node plugins/ruflo-cost-tracker/scripts/diff.mjs --baseline <path> --current <path>
Both files must be cost-summary JSON shape (validated: total_cost_usd + sessionCount required)
Computes total delta + per-tier + per-model breakdowns; entries tagged added / removed / changed
Tables sorted by |delta| descending so biggest movers bubble to the top
--alert-on-pct N exits 1 when total grew >N%; --alert-on-usd N exits 1 when total grew >$N; both can be set, first to trigger wins
Composes with cost summary --format json — the stable JSON contract is the protocol between snapshot capture and snapshot diffing

cost health [--alert-acceleration 100] [--alert-outliers 1] [--alert-days-to-exhaust 14] [--skip burn,anomaly] [--format table|json] -- Composite CI gate. Runs all four alert ladders (budget / burn / anomaly / projection) in parallel and returns max(exit_codes). One shell-out replaces four separate CI steps.

Run node plugins/ruflo-cost-tracker/scripts/health.mjs
Spawn budget-check, burn, anomaly, projection subchecks via Promise.all
Each subcheck runs --format json; parse exit codes
Projection synthesizes exit code from daysUntilReached[100%] < --alert-days-to-exhaust
Final exit = max(subcheck exits) — any failure fails the gate
Print one-line summary per check + overall HEALTHY/UNHEALTHY badge
--skip <list> to disable specific subchecks (e.g. --skip burn for fast-feedback smoke runs).

cost benchmark [--llm] [--anthropic] -- Run the corpus benchmark to verify booster claims with measured numbers.

Without flags: booster-only (free, ~85 ms wall-time, no API keys needed)
--llm: also run Gemini 2.0 Flash baseline (uses GCP GOOGLE_AI_API_KEY secret)
--anthropic: also run Claude Sonnet 4.6 + Opus 4.7 (uses GCP ANTHROPIC_API_KEY secret)
Writes results to docs/benchmarks/runs/latest.json and timestamped sibling
Print: win rate (Tier 1 cases), escalation rate (adversarial cases), per-endpoint avg latency, cost/edit, measured speedup
Smoke step 23 fails the build if winRate < 0.80. See cost-benchmark skill for env-var overrides.

cost workers -- Inspect the optimize and benchmark background workers consumed from ruflo-loop-workers.

Call mcp__claude-flow__hooks_worker-status --worker optimize -- report last-run timestamp, outcome, and any pending recommendations
Call mcp__claude-flow__hooks_worker-status --worker benchmark -- report last-run timestamp, outcome, and any pending benchmark deltas
Cross-link ruflo-loop-workers ADR-0001 §"12-worker trigger map" — the contract this command honors
Display: worker name, status, last-run timestamp, outcome, last-summary

cost history -- Show cost tracking history over time.

Recall all cost reports from cost-tracking namespace
Show daily/weekly totals with trend direction
Highlight days with unusual spending (>2x average)
Display: date, total cost, top agent, top model, budget status