doc/plans/2026-04-08-agent-browser-process-cleanup-plan.md
Status: Proposed
Date: 2026-04-08
Related issue: PAP-1231
Audience: Engineering
Explain why browser processes accumulate during local agent runs and define a cleanup plan that fixes the general process-ownership problem rather than treating agent-browser as a one-off.
Yes, there is a likely root cause in Paperclip's local execution model.
Today, heartbeat-run local adapters persist and manage only the top-level spawned PID. Their timeout/cancel path uses direct child.kill() semantics. That is weaker than the runtime-service path, which already tracks and terminates whole process groups.
If Codex, Claude, Cursor, or a skill launched through them starts Chrome or Chromium helpers, Paperclip can lose ownership of those descendants even when it still believes it handled the run correctly.
packages/adapter-utils/src/server-utils.ts
runChildProcess() spawns the adapter command and records only child.pidSIGTERM and then SIGKILL to the direct childpackages/db/src/schema/heartbeat_runs.ts
heartbeat_runs stores process_pidprocess_group_idserver/src/services/heartbeat.ts
child.kill()server/src/services/workspace-runtime.ts
detached: process.platform !== "win32"processGroupIdterminateLocalService() with group-aware killingserver/src/services/local-service-supervisor.ts
terminateLocalService() prefers process.kill(-processGroupId, signal) on POSIXSIGTERM to SIGKILLThis is the clearest internal comparison point: Paperclip already has one local-process subsystem that treats process-group ownership as the right abstraction.
If the direct adapter process exits, hangs, or is cancelled after launching a browser subtree:
That makes the failure look like an agent-browser problem when the more general bug is "executor descendants are not owned strongly enough."
agent-browser makes the problem obviousInference:
So agent-browser is probably not the root cause. It is the workload that exposes the weak ownership model fastest.
This work is successful when Paperclip can:
Do not:
agent-browser onlypkill chrome cleanup as the primary fixObjective:
Work:
Deliverable:
Objective:
Work:
runChildProcess() to create a dedicated process group on POSIXLikely touched surfaces:
packages/adapter-utils/src/server-utils.tspackages/db/src/schema/heartbeat_runs.tspackages/shared/src/types/heartbeat.tsserver/src/services/heartbeat.tsImportant design choice:
Objective:
Work:
Recommendation:
Reason:
Objective:
Work:
This should replace ad hoc scripts as the general-purpose escape hatch.
Objective:
Tests to add:
The first shipping slice should be narrow:
heartbeat_runsThat should address the main Chrome accumulation path without taking on the full restart-recovery design in the same patch.
If process-group boundaries are created incorrectly, cleanup could terminate more than the run owns.
Mitigation:
Windows does not support the POSIX negative-PID kill pattern used elsewhere in the repo.
Mitigation:
Adopting a still-running orphaned group may look attractive but can break observability if stdout/stderr pipes are already gone.
Mitigation:
Treat this as a Paperclip executor ownership bug, not an agent-browser bug.
agent-browser should remain a useful repro case, but the implementation should be shared across all local child-process adapters so any descendant process tree spawned by Codex, Claude, Cursor, Gemini, Pi, or OpenCode is owned and cleaned up consistently.