doc/plans/2026-03-13-TOKEN-OPTIMIZATION-PLAN.md
Date: 2026-03-13
Related discussion: https://github.com/paperclipai/paperclip/discussions/449
Reduce token consumption materially without reducing agent capability, control-plane visibility, or task completion quality.
This plan is based on:
The discussion is directionally right about two things:
But that is not enough on its own.
After reviewing the code and local run data, the token problem appears to have four distinct causes:
codex_local, appear to be recorded as cumulative session totals instead of per-heartbeat deltas.paperclip skill tells agents to re-fetch assignments, issue details, ancestors, and full comment threads on every heartbeat. The API does not currently offer efficient delta-oriented alternatives.The correct approach is:
Observed from the local default instance:
heartbeat_runs: 11,360 runs between 2026-02-18 and 2026-03-13usage_json.inputTokens: 2,272,142,368,952usage_json.cachedInputTokens: 2,217,501,559,420Those totals are not credible as true per-heartbeat usage for the observed prompt sizes.
Supporting evidence:
adapter.invoke.payload.prompt averages were small:
codex_local: ~193 chars average, 6,067 chars maxclaude_local: ~160 chars average, 1,160 chars maxcodex_local runs report millions of input tokensinputTokens growing up to 1,155,283,166Interpretation:
This does not mean there is no real token problem. It means we need a trustworthy baseline before we can judge optimization impact.
In server/src/services/heartbeat.ts, shouldResetTaskSessionForWake(...) returns true for:
wakeReason === "issue_assigned"wakeSource === "timer"That means many normal heartbeats skip saved task-session resume even when the workspace is stable.
Local data supports the impact:
timer/system runs: 6,587 totalSo timer wakes are the largest heartbeat path and are mostly not resuming prior task state.
The paperclip skill currently tells agents to do this on essentially every heartbeat:
Current API shape reinforces that pattern:
GET /api/issues/:id/comments returns the full threadsince, cursor, digest, or summary endpoint for heartbeat consumptionGET /api/issues/:id returns full enriched issue context, not a minimal delta payloadThis is safe but expensive. It forces the model to repeatedly consume unchanged information.
The user discussion suggested a bootstrap prompt. That is the right direction.
Current state:
bootstrapPromptTemplateinstructionsFilePath content directly into the per-run prompt or system promptResult:
Local adapters inject repo skills into runtime skill directories.
Important codex_local nuance:
$CODEX_HOME/skills or ~/.codex/skills.Current repo skill sizes:
skills/paperclip/SKILL.md: 17,441 bytes.agents/skills/create-agent-adapter/SKILL.md: 31,832 bytesskills/paperclip-create-agent/SKILL.md: 4,718 bytesskills/para-memory-files/SKILL.md: 3,978 bytesThat is nearly 58 KB of skill markdown before any company-specific instructions.
Not all of that is necessarily loaded into model context every run, but it increases startup surface area and should be treated as a token budget concern.
We should optimize tokens under these rules:
This should happen first.
sessionReusedtaskSessionReusedpromptCharsinstructionsCharshasInstructionsFileskillSetHash or skill countcontextFetchMode (full, delta, summary)Without this, we cannot tell whether a reduction came from a real optimization or a reporting artifact.
This is the highest-leverage behavior change.
forceFreshSession: true when the board wants a reset.Timer wakes are the dominant heartbeat path. Resetting them destroys both session continuity and prompt cache reuse.
This is the right version of the discussion’s bootstrap idea.
bootstrapPromptTemplate in adapter execution paths.promptTemplate intentionally small and stable:
promptTemplate contains high-churn or large inline content.Static instructions and dynamic wake context have different cache behavior and should be modeled separately.
For codex_local, this also requires isolating the Codex skill home per worktree or teaching Paperclip to repoint its own skill symlinks when the source checkout changes. Otherwise prompt and skill improvements in the active worktree may not reach the running agent.
This is the biggest product change and likely the biggest real token saver after session reuse.
Add heartbeat-oriented endpoints and skill behavior:
GET /api/agents/me/inbox-lite
GET /api/issues/:id/heartbeat-context
GET /api/issues/:id/comments?after=<cursor> or ?since=<timestamp>
GET /api/issues/:id/context-digest
Update the paperclip skill so the default pattern becomes:
Today we are using full-fidelity board APIs as heartbeat APIs. That is convenient but token-inefficient.
This protects against long-lived session bloat.
Even when reuse is desirable, some sessions become too expensive to keep alive indefinitely.
paperclippaperclip-create-agentpara-memory-filescreate-agent-adaptercodex_local, either:
CODEX_HOME, orMost agents do not need adapter-authoring or memory-system skills on every run.
Recommended order:
paperclip skill rewriteWe should treat this plan as successful only if we improve both efficiency and task outcomes.
Primary metrics:
Guardrail metrics:
Initial targets:
shouldResetTaskSessionForWake(...) so timer wakes do not reset by default.bootstrapPromptTemplate end-to-end in adapter execution.skills/paperclip/SKILL.md around delta-fetch behavior.codex_local skill resolution so worktree-local skill changes reliably reach the runtime.Treat this as a two-track effort:
If we only do Track A, we will improve things, but agents will still re-read too much unchanged task context.
If we only do Track B without fixing telemetry first, we will not be able to prove the gains cleanly.