doc/plans/2026-04-06-smart-model-routing.md
Status: Proposed Date: 2026-04-06 Audience: Product and engineering Related:
doc/SPEC-implementation.mddoc/PRODUCT.mddoc/plans/2026-03-14-adapter-skill-sync-rollout.mdThis document defines a V1 plan for "smart model routing" in Paperclip.
The goal is not to build a generic cross-provider router in the server. The goal is:
The motivating use case is a local coding adapter where a cheap model can handle the first fast pass:
Then the primary model does the substantive work.
Hermes does have a real "smart model routing" feature, but it is narrower than the name suggests.
Observed behavior:
agent/smart_model_routing.py implements a conservative classifier for "simple" turnsdebug, implement, test, plan, tool, docker, and similar termsImportant architectural detail:
cron/scheduler.py and passed into agent creation as the active provider/model/runtimeMore useful than the routing heuristic itself is Hermes' broader model-slot design:
That separation is a better fit for Paperclip than copying Hermes' exact keyword heuristic.
Paperclip already has the right execution shape for adapter-specific routing, but it currently assumes one model per heartbeat run.
Current implementation facts:
server/src/services/heartbeat.ts builds rich run context, including paperclipWake, workspace metadata, and session handoff contextconfig object and executes onceconfig.model and pass it directly to the underlying CLImodel field plus adapter-specific thinking-effort controlsAdapterExecutionResultWhat this means:
Paperclip should implement smart model routing as an adapter-local, opt-in execution pattern.
V1 decision:
Rationale:
Supported adapters should add an optional routing block to adapterConfig.
Proposed shape:
smartModelRouting?: {
enabled: boolean;
cheapModel: string;
cheapThinkingEffort?: string;
maxPreflightTurns?: number;
allowInitialProgressComment?: boolean;
}
Notes:
model as the primary modelcheapModel is adapter-specific, not globalFor adapters with provider-specific model fields later, the shape can expand to include provider/base-url overrides. V1 should start simple.
Supported adapters should run cheap preflight only when all are true:
smartModelRouting.enabled is truecheapModel is configuredSupported adapters should skip cheap preflight when any are true:
This is intentionally phase-based, not text-heuristic-based.
The cheap phase should be narrow and bounded.
Allowed responsibilities:
Not allowed in V1:
Implementation detail:
After preflight, the adapter launches the normal primary execution using the existing prompt and primary model.
The primary phase should receive:
The primary phase remains the source of truth for:
The current AdapterExecutionResult is too narrow for truthful multi-model accounting.
Add an optional segmented execution report, for example:
executionSegments?: Array<{
phase: "cheap_preflight" | "primary";
provider?: string | null;
biller?: string | null;
model?: string | null;
billingType?: AdapterBillingType | null;
usage?: UsageSummary;
costUsd?: number | null;
summary?: string | null;
}>
V1 server behavior:
executionSegments is absent, keep current single-result behavior unchangedcost_events row per segment that has cost or token usageprovider / model fields as a summary, preferably the primary phase when presentThis avoids breaking existing adapters while giving routed adapters truthful reporting.
Work:
Success criteria:
codex_localWhy first:
Implementation work:
smartModelRouting.Important guardrail:
claude_localImplementation work is similar, but the session model-switch risk is even less attractive.
Same rule:
Candidates:
cursorgemini_localopencode_localcreateServerAdapter()These should come later because each runtime has different session and model-switch semantics.
For supported built-in adapters, the agent config UI should expose:
model as the primary modelsmart model routing togglecheap modelallow initial progress comment toggleThe run detail UI should also show when routing occurred, for example:
This matters because Paperclip's board UI is supposed to make cost and behavior legible.
Hermes' cheap-route heuristic is useful precedent, but Paperclip should not start there.
Reasons:
If Paperclip later wants a cheap-only completion path for trivial runs, that can be a second-stage feature built on observed run data, not the first implementation.
If the cheap phase posts an update and the primary phase posts another near-identical update, the issue thread gets worse.
Mitigation:
If we only record the primary model, the board loses visibility into the routing cost tradeoff.
Mitigation:
Cross-model session reuse may fail or degrade context quality.
Mitigation:
A cheap model with full tools and permissions may do too much low-quality work.
Mitigation:
Required tests:
Manual checks:
codex_local cheap preflight.claude_local cheap preflight.Paperclip should ship smart model routing as:
The right V1 is not "choose the cheapest model for simple prompts." The right V1 is "use a cheap model for bounded orchestration work on fresh runs, then hand off to the primary model for the real task."