docs/skill-guidelines.md
The authoritative checklist for writing a NanoClaw skill: the bar that conformance tooling and registry review will hold every skill to. customizing.md is the short introduction; skills-model.md explains why the model works this way. This document evolves with the system; when a rule here proves wrong, fix the rule.
Every customization is an additive skill: not an edit buried in core, but a skill that carries its own code and knows how to install and remove itself. Two principles make a skill maintainable; everything else in this document follows from them.
A skill adds files and makes the smallest possible reach-ins into existing code. Adding a file or a dependency never breaks on upgrade; reaching into existing code is the only thing that does, so the integration surface is the upgrade risk. Keep reach-ins few, tiny, and ideally a single line that calls into the skill's own code.
Follows from this:
Every reach-in with a functional consequence gets a test that goes red if the wiring is deleted or drifts. That's what protects the fork from upstream changes. The tests are also the verification: there is no separate "verify" step.
Follows from this:
src/; container code uses bun:test under container/agent-runner/.The two interlock: a minimal surface keeps the integration points few and testable; a test per point keeps the surface safe. Maintainable = small surface, every functional point guarded.
A skill carries everything it needs:
SKILL.md, written as prose an agent can run. Apply must be safe to re-run: upgrades re-run it, and a skill that half-applies twice is a bug.REMOVE.md that reverses every change apply made: barrel lines deleted (not commented out), every copied file removed including tests, dependencies uninstalled, Dockerfile edits reverted, env lines removed. REMOVE.md is required exactly when apply leaves anything behind. A pure instruction-only skill that copies nothing needs none, and an empty one is noise.In rough order of safety:
git show origin/<branch>:path > path)..env, an entry at the end of a list.package.json field.Fetching from a registry branch is additive, never a merge. git fetch origin <branch> then git show origin/<branch>:path > path per file. Never git merge a registry branch into an install.
The integration point is wherever the skill reaches into existing code. Make it minimal, colocated, and self-contained:
All real logic lives in the skill's own file behind a single entry function; the edit to core is just the call.
Prefer one colocated block over edits in two places. For an inserted call, a dynamic import at the call site keeps the import and call together and avoids touching the top-of-file import block (itself a merge hotspot):
const { startDashboard } = await import('./dashboard-pusher.js');
await startDashboard();
A static import + call is acceptable too; this is a recommendation, not a mandate.
Keep any gating (feature flags, env checks) inside the skill's function, so the core edit stays a single call.
When the reach-in lands inside an entangled function, extract a tiny skill-owned helper so the core touch is one line, like args.push(...mySkillEnvArgs()), rather than exporting the whole function or inlining the logic.
What the standard requires: integration with the NanoClaw system.
For a code-edit integration point, how you test the wiring depends on whether you can invoke the function the edit lives in. Prefer behavior; fall back to structure.
main() or boot), use a structural / AST test. Use the TypeScript compiler API and assert not just that the symbol exists but its placement: awaited, a direct statement of the right function, importing the right module path, correctly ordered. A present-but-misplaced call must go red.Two more legs apply when relevant:
createChatSdkBridge), the build leg already guards it and no separate behavior test is required. The behavior-test requirement targets runtime consumption: core DB state, data shapes, registries.Together these cover deletion, misplacement, drift, and core consumption. Only true runtime-reachability (a call stranded behind a dead branch) needs the heavy option of booting the real entry point, a rare "real run" reserved for critical wiring.
A registry queryable at runtime gets a behavior test: import the real barrel, assert the registry contains the entry. A structural parse only proves the source line exists. It stays green when the barrel can't evaluate or the package isn't installed, which is exactly when the thing is actually broken. The behavior test goes red on a deleted barrel line, a barrel that won't evaluate, and an uninstalled package (the unmocked import throws), so it covers the dependency integration point for free.
Two consequences. First, don't mock the adapter's package in the shipped test: that would defeat the dependency check, and the test runs in the composed install where the package is present. Second, the only reason to fall back to a structural parse is an adapter with real import-time side effects (spawns a process, opens a socket, needs creds at load), which is an adapter smell to fix, not a reason to weaken the test. Conformant adapters do all side-effectful work in the factory or setup(), never at import.
The test matches the kind of integration point:
main() call, an entry in an mcpServers map): behavior test via the registry where queryable (see above); structural / AST test where not.src/providers/index.ts; container container/agent-runner/src/providers/index.ts), plus Dockerfile edits and a CLI or SDK dependency. Each is a separate way to break, and each needs its own guard. Ship a barrel-driven registration test per tree that imports only the real barrel and asserts the registry contains the provider. The trap: a *.factory.test.ts that imports the provider module directly self-registers it and stays green when the barrel line is deleted; that's a unit test, not a registration guard. REMOVE.md must reverse both barrel lines, all copied files in both trees, the dependency, and the Dockerfile edits.A skill that installs a package has made a reach-in: the code now assumes it's there. Guard it so a missing package goes red, in order of preference:
ARG <X>_VERSION= and install line are present, optionally backed by a <bin> --version container probe. Pin the version; reject latest.You do not need to test the dependency's own API contract; that's optional external-service coverage.
Some skills' only functional integration is a runtime operator action with no source footprint: registering an MCP server through ncl, or a mount through the sanctioned query wrapper (until the ncl add-mount verb lands). There's no line in the tree whose deletion a test could catch, so a registration test is structurally inapplicable. State this explicitly in SKILL.md rather than inventing a hollow test; conformance is then anatomy plus the dependency guard. This is a conformant outcome, valid only when the reach-in has no in-tree representation. (A raw-SQL write into core's schema to achieve the same thing is a smell, not a workaround.)
Each with its fix. These are patterns to remove, not to test around: a drift-prone, untestable reach-in is usually a symptom of a bad pattern, not a missing test. Reviewers reject them; the conformance linter will flag them automatically.
rm every file the skill copied.ncl verb; the in-tree query wrapper is the sanctioned last resort. Never the sqlite3 binary.-e KEY=… or a stdin secrets payload into the container). OneCLI gateway only; it injects credentials per request.git merge of a registry branch or any code branch). Install by additive fetch: git fetch origin <branch>, then git show origin/<branch>:path > path per file. For an update/reapply workflow, re-run each installed skill's additive apply, never merge.In-tree exemplars for the code archetypes. (Two carry known smells, kept deliberately pending architectural fixes; they demonstrate the test shapes, not perfection.)
add-dashboard: in-process seam with core (the pusher against the central DB), plus an AST wiring test for its main() call.add-slack: Chat SDK channel registration; the template for the whole channel family.add-deltachat: native channel registration.add-atomic-chat-tool: MCP-tool wiring across both runtimes (container registration and host env-helper call).add-opencode / add-codex: the provider multi-point archetype, with two barrels, Dockerfile pins, and per-tree registration tests.