docs/design/model-aware-molecules.md
Plan for adding model-specific constraints to molecule steps with subscription-aware routing.
Status: In Progress Owner: Design Related: molecules.md | agent-provider-interface.md
Consensus and model-aware molecules are complementary layers that share the same session awareness infrastructure but serve different purposes:
| Consensus | Molecules | |
|---|---|---|
| Pattern | Fan-out | DAG routing |
| Shape | Same prompt → N agents → compare | N steps → best model per step |
| Session infra | GT_AGENT + AgentPresetInfo readiness | Same — reused, not rebuilt |
| Routing goal | Diversity (multiple perspectives) | Optimality (right model for each step) |
The provider resolution pipeline that Consensus v2 established — GT_AGENT env lookup → AgentPresetInfo → readiness detection (prompt polling or delay fallback) — is exactly the session awareness the molecule router needs for dispatch. See §5.3 (Two-Phase Routing).
Molecules currently support dependency-based DAG execution, but lack the ability to specify which AI model should execute each step. With multiple AI providers (Anthropic, OpenAI, DeepSeek, Google, etc.) and access types (API keys and subscriptions like Claude Code), we need:
~/.gt/usage.jsonl (OTel additive/optional)| Goal | Description |
|---|---|
| Molecule-Level Constraints | Add model/capability constraints to molecule steps |
| Subscription Support | Support both API key AND subscription-based access |
| Live Pricing | Fetch pricing from OpenRouter with 24h local cache |
| Static Benchmarks | Bundle MMLU/SWE scores; override via ~/.gt/models.toml |
| Meta-Model Routing | Heuristic-only scoring: no LLM calls |
| Local Usage Tracking | ~/.gt/usage.jsonl always written; OTel is additive |
| DAG Compatible | Works with existing molecule DAG structure |
| Backward Compatible | Existing formulas work without modification |
All implementation stories in this plan must pass these quality gates:
go test ./...golangci-lint runDescription: As a Gas Town operator, I want to configure Claude Code subscription so that it is automatically preferred over API keys due to cost reasons.
Acceptance Criteria:
CLAUDE_CODE_SUBSCRIPTION=active enables subscription detectionbd ready --json includes subscription quota informationDescription: As a developer, I want a built-in database of model capabilities (MMLU, SWE, costs) that is used by the routing system without requiring manual configuration.
Acceptance Criteria:
internal/models/database.go contains static model entries with benchmark scoreshttps://openrouter.ai/api/v1/models) with 24h cache~/.gt/models_pricing_cache.json; fetching fails gracefully (zero pricing used)~/.gt/models.toml overrides or extends any field including prices, benchmarks, new modelsGetModel(db, id) returns model metadata or nilLoadDatabase(gtDir) = static + OpenRouter pricing + user overridesDescription: As a system, I want a lightweight routing algorithm that selects which model to use based on task requirements and cost constraints without calling another LLM.
Acceptance Criteria:
internal/models/router.go implements SelectModel() with heuristics onlyDescription: As a formula author, I want to specify model constraints in molecule steps using a simple TOML syntax.
Acceptance Criteria:
model = "claude-sonnet-4-5" for exact modelprovider = "anthropic" for any model from a providermodel = "auto" for heuristic routingmin_mmlu = 85 and min_swe = 70 for quality thresholdsrequires = ["vision", "code_execution"] for capability constraintsaccess_type = "subscription" to require subscription accessmax_cost = 0.01 for cost constraints (USD per 1K tokens, combined)model and provider cannot be set simultaneously (parser error)Description: As a system, I want to track model usage locally so that operators can monitor costs without depending on OTel.
Acceptance Criteria:
internal/models/usage.go records usage to ~/.gt/usage.jsonl (always)LoadUsage(gtDir, since) reads and filters entriesMonthlyStats(entries, year, month) aggregates by modelTotalCost(entries) sums USD costGT_OTEL_LOGS_URL is setgt prime with Model InfoDescription: As an operator, I want gt prime to show which models are available for each step and which model will be used.
Acceptance Criteria:
gt step <step-id> executes a specific step with model routing✓ subscription vs $0.003/K api_keyDescription: As an operator, I want to execute an entire molecule with automatic model assignment per step.
Acceptance Criteria:
gt mol execute --auto-route <mol-id> reads constraints and routes per stepDescription: As an operator, I want gt usage to show comprehensive usage statistics.
Acceptance Criteria:
gt usage shows monthly summary: total cost, invocations, subscription usesgt usage --month 2025-02 filters to a specific month~/.gt/usage.jsonl// internal/models/database.go
// SubscriptionEligible bool on ModelEntry indicates the model can be accessed
// via a subscription (e.g. Claude Code for Anthropic models).
// The caller detects subscription availability from env vars and passes it
// as StepConstraints.SubscriptionActive.
Note: Claude Code is an access method, not a model. Do not create a fake "claude-code" model entry. The correct modelling is SubscriptionEligible: true on Anthropic model entries and AccessType: "subscription" on the routing decision when a subscription is active.
// internal/models/database.go
type ModelEntry struct {
ID string // "claude-sonnet-4-5"
Provider string // "anthropic"
Name string // "Claude Sonnet 4.5"
OpenRouterID string // "anthropic/claude-sonnet-4-5" (for pricing fetch)
// Benchmark scores (static, overridable via ~/.gt/models.toml)
MMLUScore float64
SWEScore float64
// Capabilities
Vision bool
CodeExecution bool
ContextWindow int
// Pricing in USD per 1K tokens (fetched from OpenRouter, cached 24h)
CostPer1KIn float64
CostPer1KOut float64
SubscriptionEligible bool
GoodFor []string
}
// LoadDatabase merges: static benchmarks → OpenRouter pricing → ~/.gt/models.toml overrides
func LoadDatabase(gtDir string) []ModelEntry
External pricing source: OpenRouter (https://openrouter.ai/api/v1/models)
~/.gt/models_pricing_cache.json for 24hBenchmark data: Bundled statically in staticDB (from published evaluations).
Override or extend via ~/.gt/models.toml:
# Override a built-in model's benchmark
[models.claude-sonnet-4-5]
mmlu = 84.5
swe = 52.0
# Add a new model not in the static DB
[models.my-local-model]
provider = "custom"
mmlu = 70.0
cost_per_1k_in = 0.0
cost_per_1k_out = 0.0
good_for = ["coding"]
Routing happens in two sequential phases. No LLM calls are made at any point.
SelectModel)Picks the optimal model from the capability database based on step constraints and scoring heuristics.
// internal/models/router.go
type StepConstraints struct {
Model string // exact ID or "auto"
Provider string
AccessType string // "subscription" | "api_key"
MinMMLU float64
MinSWE float64
Requires []string
MaxCost float64 // USD per 1K tokens (combined)
// Filled by caller from env/config:
SubscriptionActive bool
}
type RoutingDecision struct {
// Model selection (Phase 1)
ModelID string
Provider string
AccessType string // "subscription" | "api_key"
Reason string
CostPer1KIn float64
CostPer1KOut float64
MMLUScore float64
SWEScore float64
// Session resolution (Phase 2) — nil when no live session found
SessionID string // tmux session name, e.g. "gt-gastown-polecat-Toast"
AgentPreset string // resolved GT_AGENT value, e.g. "claude", "gemini"
}
func SelectModel(constraints StepConstraints, db []ModelEntry) (*RoutingDecision, error)
Scoring:
| Factor | Weight |
|---|---|
| Subscription active + model eligible | +40 pts |
| MMLU score (normalized 0–100) | up to 30 pts |
| SWE score (normalized 0–100) | up to 20 pts |
| Cost savings (inverse of $0.10/1K ceiling) | up to 10 pts |
ResolveSession)After a model is selected, find a live, idle tmux session running that model. This reuses the existing GT_AGENT + AgentPresetInfo infrastructure from the provider resolution pipeline — the same logic Consensus v2 uses.
// internal/models/router.go (planned)
// ResolveSession scans running tmux sessions and returns the first one that is
// idle and running the selected model. Returns nil if no matching session is found.
//
// Resolution:
// 1. List active tmux sessions
// 2. Read GT_AGENT env var from each session
// 3. Look up AgentPresetInfo for that agent name
// 4. Check readiness: prompt polling (ReadyPromptPrefix e.g. "❯ ") or delay fallback (ReadyDelayMs)
// 5. Return first session that matches ModelID and is idle
func ResolveSession(decision *RoutingDecision, tmux Tmux) *RoutingDecision
Readiness detection is taken directly from AgentPresetInfo — no new mechanism:
| Agent type | Detection method | Source |
|---|---|---|
| Claude | Prompt prefix polling (❯ ) | AgentPresetInfo.ReadyPromptPrefix |
| OpenCode, Codex | Delay-based fallback | AgentPresetInfo.ReadyDelayMs |
| Custom agents | Delay-based fallback | Same |
Dispatch outcome:
AgentPresetInfo.Command + Args)This means molecule steps target live sessions by model capability, not just by name. A step specifying min_mmlu = 85 will route to whichever idle session happens to be running a qualifying model, without the formula author needing to know session names.
# All constraint fields are optional and backward-compatible.
# Existing steps without constraints accept any available agent.
[[steps]]
id = "analyze-requirements"
title = "Analyze requirements"
needs = ["load-context"]
# Option A: exact model
model = "claude-sonnet-4-5"
[[steps]]
id = "code-generation"
title = "Code generation"
needs = ["analyze-requirements"]
# Option B: heuristic routing with quality and cost constraints
model = "auto"
min_mmlu = 85
min_swe = 50
max_cost = 0.01
[[steps]]
id = "quick-scan"
title = "Quick scan"
# Option C: provider + capability filter
provider = "openai"
requires = ["code_execution"]
[[steps]]
id = "security-audit"
title = "Security audit"
# Option D: prefer subscription (zero cost)
access_type = "subscription"
model and provider are mutually exclusive (parser error if both are set).
// internal/models/usage.go
type UsageEntry struct {
Timestamp time.Time `json:"timestamp"`
ModelID string `json:"model_id"`
Provider string `json:"provider"`
AccessType string `json:"access_type"`
TaskType string `json:"task_type"`
TokensIn int `json:"tokens_in"`
TokensOut int `json:"tokens_out"`
CostUSD float64 `json:"cost_usd"`
Success bool `json:"success"`
LatencyMs int `json:"latency_ms"`
Reason string `json:"reason,omitempty"`
}
func RecordUsage(gtDir string, entry UsageEntry) error // appends to usage.jsonl
func LoadUsage(gtDir string, since time.Time) ([]UsageEntry, error)
func MonthlyStats(entries []UsageEntry, year int, month time.Month) map[string]*ModelStats
func EstimateCost(model *ModelEntry, tokensIn, tokensOut int) float64
OTel integration: callers that want OTel observability emit an agent.usage OTel log event separately (see docs/otel-data-model.md). usage.jsonl is always written and does not depend on OTel being configured.
# Subscription detection
export CLAUDE_CODE_SUBSCRIPTION=active # enables subscription preference
export [email protected] # informational
export CLAUDE_CODE_PLAN=pro # informational
# API Key Access (existing)
export ANTHROPIC_API_KEY=sk-ant-xxx
export OPENAI_API_KEY=sk-openai-xxx
export GOOGLE_API_KEY=xxx
export DEEPSEEK_API_KEY=xxx
# Model Defaults (new)
export GT_DEFAULT_MODEL=claude-sonnet-4-5 # fallback for unconstrained steps
export GT_PREFERRED_PROVIDER=anthropic
# Thresholds (new)
export GT_MIN_MMLU=80
export GT_MIN_SWE=50
export GT_MAX_COST=0.005
# Usage tracking
export GT_TRACK_USAGE=true # default true; set false to disable
Note: CLAUDE_CODE_QUOTA is not a real env var — Claude Code does not expose token quota programmatically. If quota tracking is needed, derive it from ~/.gt/usage.jsonl entries with access_type="subscription".
~/.gt/models.toml — Model Database Override# Override built-in benchmark scores
[models.claude-sonnet-4-5]
mmlu = 84.5
# Add a new model
[models.deepseek-v3-local]
provider = "deepseek"
mmlu = 88.0
swe = 48.0
cost_per_1k_in = 0.00014
cost_per_1k_out = 0.00028
context_window = 131072
good_for = ["coding", "reasoning"]
~/.gt/models_pricing_cache.json — OpenRouter pricing cache (auto-managed)Written by LoadDatabase; refreshed after 24h. Do not edit manually.
gt prime output (planned)### Step 2: Analyze requirements
Constraint: model=auto, min_mmlu=85
Recommended: claude-opus-4-5 (subscription, $0.00)
Fallback: claude-sonnet-4-5 (api_key, $0.003/1K)
### Step 3: Code generation
Constraint: provider=openai, requires=[code_execution]
Recommended: gpt-4o ($0.0025/1K in)
gt step <step-id> # execute step with model routing
gt mol execute --auto-route <mol-id> # batch DAG execution with routing
gt usage # monthly cost summary
gt usage --month 2025-02 # filter to specific month
gt model route --task coding --mmlu 85 # debug: test routing logic
formula = "mol-subscription-aware"
version = 1
[[steps]]
id = "code-review"
title = "Code review"
access_type = "subscription"
model = "auto"
description = "Review code changes"
[[steps]]
id = "implement-fixes"
title = "Implement fixes"
needs = ["code-review"]
model = "auto"
description = "Implement the fixes"
formula = "mol-multi-model-review"
version = 1
[[steps]]
id = "claude-review"
title = "Review with Claude"
model = "claude-sonnet-4-5"
description = "Review the code changes"
[[steps]]
id = "gpt-review"
title = "Review with GPT-4o"
model = "gpt-4o"
parallel = true
description = "Review the same code"
[[steps]]
id = "synthesize"
title = "Synthesize findings"
needs = ["claude-review", "gpt-review"]
min_mmlu = 85
description = "Combine both reviews"
formula = "mol-cost-optimized"
version = 1
[[steps]]
id = "quick-scan"
title = "Quick scan"
model = "auto"
max_cost = 0.001
description = "Fast overview with cheapest capable model"
[[steps]]
id = "deep-work"
title = "Deep work"
needs = ["quick-scan"]
model = "auto"
min_mmlu = 85
max_cost = 0.01
description = "Thorough work with quality model within budget"
internal/models/database.go — static benchmarks + OpenRouter pricing + TOML overridesinternal/models/router.go — SelectModel() heuristic scoringinternal/models/usage.go — local JSONL tracking; MonthlyStats, EstimateCostinternal/formula/types.go Step structinternal/formula/parser.goCLAUDE_CODE_SUBSCRIPTION and pass SubscriptionActive into StepConstraintsImplement ResolveSession() using the existing GT_AGENT + AgentPresetInfo infrastructure:
GT_AGENT env var per session (already done in sling_helpers.go)AgentPresetInfo by agent name to get ReadyPromptPrefix / ReadyDelayMsReadyPromptPrefix, delay fallback otherwiseRoutingDecision.ModelID; set SessionID + AgentPresetAgentPresetInfo.Command + Args for the selected modelgt prime to show model constraints, routing recommendation, and live session per stepgt step for single-step execution with two-phase routinggt mol execute --auto-route for batch DAG executiongt usage and gt usage --monthRecordUsage into the agent dispatch pathTokensIn/TokensOut from agent.usage OTel events when available, or estimatemodel.route event when a routing decision is madeWhen both subscription and API key are available for the same provider:
SelectModel() is pure heuristics — no LLM calls:
| Factor | Weight | Notes |
|---|---|---|
| Subscription active + model eligible | +40 pts | Free = always prefer |
| MMLU score | up to 30 pts | General knowledge quality |
| SWE score | up to 20 pts | Code-specific quality |
| Cost savings | up to 10 pts | Inverse of $0.10/1K ceiling |
| Quota availability | Hard filter | Applied before scoring |
Steps without any routing fields accept any idle agent — unchanged behaviour:
[[steps]]
id = "simple-step"
title = "Simple step"
needs = ["previous-step"]
# No model constraint → any idle agent
~/.gt/usage.jsonl regardless of OTel configuration| Question | Discussion |
|---|---|
| Dispatch mechanism | Resolved: ResolveSession() targets the live tmux session directly. The routing decision (SessionID, AgentPreset) is the dispatch target — no separate env var injection needed. The step description is sent via the existing tmux send-keys / nudge path. |
| Model ID ↔ GT_AGENT mapping | GT_AGENT values are agent preset names ("claude", "gemini"), not model IDs ("claude-sonnet-4-5"). Need a mapping: AgentPresetInfo could carry a DefaultModelID field, or sessions could set an additional GT_MODEL env var at spawn time for precise matching. |
| Multiple sessions for same model | If two Claude sessions are idle and both qualify, which gets the step? Current proposal: first idle session wins (FIFO). Alternative: round-robin or load-based. |
| Cost-based auto-switch | Should the system switch to cheaper models mid-session if budget is nearly exhausted? |
| Model performance learning | Should historical success rates (from usage.jsonl) influence routing weights? |
| Multi-subscription support | Support for multiple Claude Code team subscriptions simultaneously? |