doc/plans/2026-03-13-paperclip-skill-tightening-plan.md
Deferred follow-up. Do not include in the current token-optimization PR beyond documenting the plan.
The paperclip skill is part of the critical control-plane safety surface. Tightening it may reduce fresh-session token use, but it also carries prompt-regression risk. We do not yet have evals that would let us safely prove behavior preservation across assignment handling, checkout rules, comment etiquette, approval workflows, and escalation paths.
The current PR should ship the lower-risk infrastructure wins first:
Fresh runs still spend substantial input tokens even after the context-path fixes. The remaining large startup cost appears to come from loading the full paperclip skill and related instruction surface into context at run start.
The skill currently mixes three kinds of content in one file:
That structure is safe but expensive.
Restructure the skill into:
The core should cover only what is needed on nearly every wake:
heartbeat-context firstThe same rules are currently expressed multiple times across:
Refactor so each operational fact has one primary home:
This reduces prompt weight and lowers the chance of internal instruction drift.
Rewrite the hot path using compact operational forms:
Reduce:
These workflows should remain available but should not dominate fresh-run context:
<plan/> writeback flowRecommended approach:
The skill should distinguish:
That separation makes it easier to evaluate prompt changes later and lets adapters or orchestration choose what must always be loaded.
Run scenario checks for:
After validation, decide whether:
Do not change this loading policy without validation.