Ruflo Cost - Ruflo — ContextQMD

Cost tracking commands:

cost report [--period today|week|month] -- Generate a cost report for the specified period.

Recall token usage records from cost-tracking namespace for the period
Compute costs by model using current pricing (haiku/sonnet/opus input/output rates)
Aggregate by agent, task, and model
Show budget utilization percentage if a budget is configured
Display: total cost, breakdown by model, breakdown by agent, budget status

cost breakdown [--by agent|model|task] -- Detailed cost breakdown by dimension.

Recall all usage records from cost-tracking namespace
Group by the specified dimension (agent, model, or task)
For each group: total tokens (input/output/cache), total cost, percentage of total
Sort by cost descending
Display: dimension value, input tokens, output tokens, cache tokens, total cost, share %

cost budget set <amount> -- Set a budget limit in USD (real implementation, persisted to cost-tracking:budget-config).

Run node plugins/ruflo-cost-tracker/scripts/budget.mjs set <amount> to write the config to the cost-tracking namespace
Thresholds default to: info 50% · warning 75% · critical 90% · hard_stop 100%
Report: confirmed amount + namespace key

cost budget get -- Show the current budget config.

Run node plugins/ruflo-cost-tracker/scripts/budget.mjs get
Report: amount, when set, threshold ladder

cost budget check [--period today|week|month|all] -- Compute utilization + alert level (50/75/90/100% ladder).

Run node plugins/ruflo-cost-tracker/scripts/budget.mjs check
Filter by BUDGET_PERIOD=today|week|month|all (default all)
Sum total_cost_usd across all session-* records in cost-tracking
Compute utilization vs. budget; emit 🟢 OK / 🟡 INFO / 🟠 WARNING / 🔴 CRITICAL / 🛑 HARD_STOP
Exit code 1 on HARD_STOP — wrap agent spawns in budget check && spawn ... to fail closed

cost optimize -- Analyze usage and suggest cost optimizations.

Recall recent usage data from cost-tracking namespace
For each agent, analyze: average task complexity, model used, token efficiency
Identify agents using expensive models for low-complexity tasks
Check cache hit rates and suggest caching improvements
Look for redundant agent spawns or duplicate work
Calculate estimated savings for each recommendation
Display: recommendation, current cost, projected cost, savings, impact assessment

cost track -- Auto-capture token usage for the active Claude Code session and persist to the cost-tracking namespace. Run after significant work or at session end so cost report has real data.

Invoke node plugins/ruflo-cost-tracker/scripts/track.mjs (no flags = current cwd's most-recent session)
Print: total cost, per-model and per-tier breakdown, persisted memory key
Sets the cost-tracking namespace record at key session-<sessionId> (consumed by cost-report step 1)

cost outcome <task> <model> <outcome> -- Emit a hooks_model-outcome event so the router learns from applied recommendations. Auto-wired into cost-optimize step 8.

Validates outcome ∈ {success, escalated, failure}
Runs node plugins/ruflo-cost-tracker/scripts/outcome.mjs "<task>" <model> <outcome>
The script wraps npx @claude-flow/cli hooks model-outcome -t ... -m ... -o ... with explicit-argv spawnSync so quoting is safe
Without this, the router doesn't learn from cost-optimize recommendations and the Tier 1 bypass rate doesn't tighten over time

cost summary [--format json|markdown] -- Single-shot programmatic dump of all cost data. Other plugins/scripts can shell out and parse the JSON.

Run node plugins/ruflo-cost-tracker/scripts/summary.mjs --format json
Output: total_cost_usd, sessionCount, byTier, byModel, topSession, budget, federation aggregate
Default --format markdown; JSON contract is stable for programmatic consumers
ADR-0002 considered an MCP-tool form but deferred (requires v3 source change); this is the plugin-local equivalent

cost federation -- Consumer-side wiring for ADR-097 Phase 3 federation_spend events. Aggregates per-peer 1h/24h/7d rolling windows and flags peers exceeding the suspension threshold (default $5/24h).

Run node plugins/ruflo-cost-tracker/scripts/federation.mjs
Optional: FED_FORMAT=json, FED_NAMESPACE=federation-spend, FED_SUSPEND_THRESHOLD_USD=5.0
Reports gracefully when no events present (Phase 3 not yet landed upstream)
Activates automatically when upstream publishes {peerId, taskId, tokensUsed, usdSpent, ts} to the federation-spend namespace

cost export [--prometheus <path>] [--webhook <url>] -- Export cost-tracking telemetry to external observability systems.

--prometheus <path> writes the node_exporter textfile-collector format (gauges + counters with session labels)
--webhook <url> POSTs JSON; auth via EXPORT_WEBHOOK_HEADER='K: V'
No flag → stdout JSON
Metrics emitted: cost_tracker_total_usd, cost_tracker_tier_total_usd{tier=...}, cost_tracker_session_total_usd{session=...}, cost_tracker_session_messages{session=...}, cost_tracker_budget_usd, cost_tracker_budget_utilization

cost conversation -- Per-conversation cost view: list every session in cost-tracking with started-at, message count, top model, total cost. Different lens from cost report (which is per-agent/per-model).

Run node plugins/ruflo-cost-tracker/scripts/conversation.mjs
Optional CONV_FORMAT=json, CONV_LIMIT=N, CONV_NAMESPACE=...
Reports: total across conversations, per-tier rollup, per-session table

cost trend -- Read all docs/benchmarks/runs/*.json and surface drift in the gate metrics — win rate, avg latency, p99, escalation rate, speedup vs LLM. Flags regressions the binary smoke gate misses.

Run node plugins/ruflo-cost-tracker/scripts/trend.mjs
Optional TREND_FORMAT=json for machine-readable output, TREND_LIMIT=N to truncate
Reports: first→last deltas + per-run series + regression flags (win rate drop or ≥1.5× latency rise)

cost benchmark [--llm] [--anthropic] -- Run the corpus benchmark to verify booster claims with measured numbers.

Without flags: booster-only (free, ~85 ms wall-time, no API keys needed)
--llm: also run Gemini 2.0 Flash baseline (uses GCP GOOGLE_AI_API_KEY secret)
--anthropic: also run Claude Sonnet 4.6 + Opus 4.7 (uses GCP ANTHROPIC_API_KEY secret)
Writes results to docs/benchmarks/runs/latest.json and timestamped sibling
Print: win rate (Tier 1 cases), escalation rate (adversarial cases), per-endpoint avg latency, cost/edit, measured speedup
Smoke step 23 fails the build if winRate < 0.80. See cost-benchmark skill for env-var overrides.

cost workers -- Inspect the optimize and benchmark background workers consumed from ruflo-loop-workers.

Call mcp__claude-flow__hooks_worker-status --worker optimize -- report last-run timestamp, outcome, and any pending recommendations
Call mcp__claude-flow__hooks_worker-status --worker benchmark -- report last-run timestamp, outcome, and any pending benchmark deltas
Cross-link ruflo-loop-workers ADR-0001 §"12-worker trigger map" — the contract this command honors
Display: worker name, status, last-run timestamp, outcome, last-summary

cost history -- Show cost tracking history over time.

Recall all cost reports from cost-tracking namespace
Show daily/weekly totals with trend direction
Highlight days with unusual spending (>2x average)
Display: date, total cost, top agent, top model, budget status