.agents/skills/paperclip-dev-workspace-run-verify-fix/SKILL.md
This skill is for Paperclip-specific development workspaces whose service is
started through project execution workspace runtime services, typically a
worktree service such as paperclip-dev.
Success means all of these are true:
/api/health returns status: ok and bootstrapStatus: ready200 and does not show the first-admin setup gaterunning / healthy with the
expected URLrunning / healthyIf any item fails, keep fixing. Do not mark the issue done because one probe passed.
doc/DEVELOPING.md before running Paperclip CLI, dev server, worktree,
database, build, or test commands.psql, raw embedded-Postgres commands, or
ad hoc row copying for the normal fix path.pnpm dev or detached shell
processes when the task is about a reusable workspace service.Use environment variables when available. Do not print API keys or passwords.
PAPERCLIP_API_URL: main control-plane API URLPAPERCLIP_API_KEY: agent API keyPAPERCLIP_RUN_ID: current run id for runtime-service mutationsPAPERCLIP_TASK_ID: current issue idPAPERCLIP_COMPANY_ID: company idPAPERCLIP_AGENT_ID: current agent idservice:paperclip-devhttp://paperclip-dev:40631If the execution workspace id or service command id is missing, read the issue, project, or execution-workspace API records first. Do not guess and start an unmanaged server on a random port.
Use this when the user says to start the workspace, start it again, or fix a workspace that should be freshly ready.
Use the runtime-service endpoints on the main control plane. Include
X-Paperclip-Run-Id so the mutation is associated with the current heartbeat.
curl -sS -X POST \
"$PAPERCLIP_API_URL/api/execution-workspaces/$EXECUTION_WORKSPACE_ID/runtime-services/stop" \
-H "Authorization: Bearer $PAPERCLIP_API_KEY" \
-H "X-Paperclip-Run-Id: $PAPERCLIP_RUN_ID" \
-H "Content-Type: application/json" \
--data-binary '{"workspaceCommandId":"service:paperclip-dev"}'
curl -sS -X POST \
"$PAPERCLIP_API_URL/api/execution-workspaces/$EXECUTION_WORKSPACE_ID/runtime-services/start" \
-H "Authorization: Bearer $PAPERCLIP_API_KEY" \
-H "X-Paperclip-Run-Id: $PAPERCLIP_RUN_ID" \
-H "Content-Type: application/json" \
--data-binary '{"workspaceCommandId":"service:paperclip-dev"}'
If the API returns an existing service, treat that as a candidate only. Verify
its real /api/health before trusting it.
Use a full reseed when the app says setup is incomplete, login works but data is missing, the cloned app does not have the expected companies/issues/agents, or the user explicitly asks for the normal isolated-workspace database.
pnpm paperclipai worktree reseed --from-instance default --seed-mode full --yes
After reseed, restart through the managed runtime path. A reseed can copy runtime-service rows whose ids no longer match the local process registry, so runtime adoption and reconciliation must be verified after the start.
Do not consider the reseed complete until the served app has both auth and populated product data. A one-off inserted user/account row is a diagnostic clue, not the final state.
Set SERVICE_URL to the service URL returned by the runtime API.
curl -sS "$SERVICE_URL/api/health" | jq
curl -sS -I "$SERVICE_URL/" | head
Expected health:
status: "ok"bootstrapStatus: "ready"bootstrapInviteActive: falseFailures to reject:
bootstrap_pending: the instance will show the first-admin setup gatedatabase_unreachable: a web process is listening but its database is dead200 with unhealthy /api/health: stale process adoption bug or a
dead embedded database behind a live Node processCheck the port owner when a process is already listening:
lsof -nP -iTCP:"$SERVICE_PORT" -sTCP:LISTEN || true
Use this only to identify and remove a stale matching Paperclip dev-runner process after managed stop fails. Do not kill unrelated processes.
Read the execution workspace from the main API and inspect the runtime service record.
curl -sS \
"$PAPERCLIP_API_URL/api/execution-workspaces/$EXECUTION_WORKSPACE_ID" \
-H "Authorization: Bearer $PAPERCLIP_API_KEY" | jq
The target service should show:
service:paperclip-devstatus: "running"If the main app says running but /api/health is bad, stop and replace the
stale process through the managed runtime.
The cloned Paperclip app must also know about the service. Query the same execution workspace through the served app when agent auth is available there:
curl -sS \
"$SERVICE_URL/api/execution-workspaces/$EXECUTION_WORKSPACE_ID" \
-H "Authorization: Bearer $PAPERCLIP_API_KEY" | jq
The served app should agree that the service is running / healthy at the
same URL. If the main control plane and served app disagree after a reseed,
the cloned database may contain copied runtime-service ids that do not match
the local process registry. Use the normal start/adoption path again and verify
both sides. If code changed in this area, add a focused regression test.
Auth and data checks should use product APIs and browser/QA review, not raw DB queries.
Minimum API checks:
curl -sS "$SERVICE_URL/api/health" | jq '.status, .bootstrapStatus'
curl -sS "$SERVICE_URL/api/companies" \
-H "Authorization: Bearer $PAPERCLIP_API_KEY" | jq
curl -sS "$SERVICE_URL/api/agents/me" \
-H "Authorization: Bearer $PAPERCLIP_API_KEY" | jq
Then verify at least one expected cloned product record through the API, such as a known project, issue key, company, or execution workspace that should exist in the primary instance. Pick a record relevant to the current issue rather than a random table count.
Browser or QA check:
If this environment cannot launch a browser, ask QA to do the visual/login check and still complete all API checks you can run. Report that browser verification was delegated and why.
Symptom: the page says no admin has claimed the instance.
Fix: run a full worktree reseed from the primary instance, then restart the managed service. Claiming a first admin can clear the gate, but if the user asked for the normal isolated workspace database, full reseed is the correct fix.
Verify: /api/health has bootstrapStatus: ready, login works, and populated
data exists.
Symptom: bootstrap is ready, but the user's normal dev credentials do not work.
Likely cause: the isolated DB has roles or bootstrap state but lacks the primary instance auth users/accounts.
Fix: full reseed. Do not manually copy only Better Auth user/account rows as the final fix.
Verify: user login through browser/QA and /api/agents/me with the agent key.
Symptom: user can sign in, but companies/issues/projects/runs are empty or clearly incomplete.
Likely cause: a partial auth repair was done instead of a full cloned database.
Fix: full reseed, managed restart, then verify representative cloned records.
Symptom: curl -I / returns a response, but /api/health reports
database_unreachable.
Likely cause: stale Node/web process remained alive after embedded Postgres died.
Fix: managed stop first. If the process survives, identify the matching Paperclip dev-runner process group for the target port and terminate only that group. Then managed start.
Verify: /api/health is ok after a stability wait and the runtime record is
healthy.
Symptom: the service URL works, but the main app says the service was not created or is stopped.
Likely cause: detached workaround process, stale provider ref, or service adoption trusted the root URL instead of health.
Fix: shut down the unmanaged process and restart through the managed runtime.
If code repair is needed, ensure adoption checks /api/health, replaces
unhealthy adopted processes, and records the current provider ref.
Verify: main runtime row and /api/health agree.
Symptom: the main app sees paperclip-dev running, but the cloned app copied a
runtime-service row whose id does not match the local registry.
Likely cause: normal DB clone copied persisted runtime rows from the primary instance into an isolated environment with different local process metadata.
Fix: use managed start/adoption again. If code repair is needed, adoption should reconcile by service identity and port, not only by copied row id.
Verify: main app and served app both show the same service as
running / healthy.
Symptom: starting from the served app returns 403 or a mutation partially
applies before activity logging fails.
Likely causes: the cloned issue/run/agent state does not match the current heartbeat, or the run id is absent in the cloned database after reseed.
Fix: prefer the main control-plane managed runtime path and full reseed. If the cloned app state itself must be repaired, use normal Paperclip issue/run transitions first. Do not hide the condition with raw DB edits; report the exact guard or missing row if it blocks the normal path.
Make a code change only when the normal operational repair exposes a product bug. Examples from this failure class:
lsof arguments200 instead of /api/healthAdd focused tests in the affected service test file. For workspace runtime repairs, the narrow verification is usually:
pnpm exec vitest run server/src/__tests__/workspace-runtime.test.ts
git diff --check
Commit logical code changes and link the commit in the issue comment. If no tracked code changed, say so explicitly.
Use concrete evidence, not a vague "it works".
Fixed and verified the workspace service.
Root cause:
- <why it broke>
Fix:
- <normal reseed/start/repair steps>
- <code commit if any>
Verified:
- main control plane shows <service> running/healthy at <url>
- served workspace app shows the same service running/healthy
- <url>/api/health is ok with bootstrapStatus ready
- root page returns 200 and no setup gate
- dev login verified by <agent browser / QA / user> without posting credentials
- cloned data verified via <specific API records>
- targeted tests: <commands>
Remaining:
- <none, or named owner/action if blocked>
Mark the issue done only when every success-condition item is satisfied. If
not, mark blocked with a named unblock owner and the exact action needed.