plugins/ruflo-cost-tracker/skills/cost-counterfactual/SKILL.md
Multi-baseline counterfactual cost analysis. Pairs with the existing observability surface:
cost-budget-check — "have we crossed a threshold?" (reactive)cost-projection — "when will we cross a threshold?" (predictive)cost-counterfactual — "is the routing earning its keep?" (comparative) ← this onesession-* records from the cost-tracking namespace.--since window filter (default all-time).byModel[*] entries for each session.counterfactualUsd = (input × tier.input + output × tier.output + cache_write × tier.cache_write + cache_read × tier.cache_read) / 1Msavings = counterfactualUsd − actualUsd.| Sessions considered | 2 |
| Total input tokens | 100,000 |
| Actual spend | $0.162500 |
| Baseline | Hypothetical | Actual | Savings | % |
| `always-haiku` | $0.025000 | $0.162500 | -$0.137500 | -550.00% |
| `always-sonnet` | $0.300000 | $0.162500 | +$0.137500 | 45.83% |
| `always-opus` | $1.500000 | $0.162500 | +$1.337500 | 89.17% |
A negative always-haiku result means the router chose more-expensive models than haiku on tasks haiku could have handled. That's an over-escalation signal:
cost optimize (or inspect specific sessions via cost conversation) to investigatePositive savings quantify the router's win against that baseline. The most informative number is usually always-sonnet — it's the standard "safe default" baseline most teams would pick if they didn't have routing.
cost counterfactual --format json | jq '.baselines[1].savingsPct > 30' — fail builds if routing isn't saving ≥30% vs sonnet baseline (workload-shift detector).Like all counterfactual analyses, this assumes the same tokens at the same complexity would have produced the same outcome from the baseline model. That's an upper bound — the baseline might have failed and required retries, which the math doesn't capture. Treat the numbers as a quality-blind ceiling.