doc/plans/2026-03-17-docker-release-browser-e2e.md
Today release smoke testing for published Paperclip packages is manual and shell-driven:
HOST_PORT=3232 DATA_DIR=./data/release-smoke-canary PAPERCLIPAI_VERSION=canary ./scripts/docker-onboard-smoke.sh
HOST_PORT=3233 DATA_DIR=./data/release-smoke-stable PAPERCLIPAI_VERSION=latest ./scripts/docker-onboard-smoke.sh
That is useful because it exercises the same public install surface users hit:
npx paperclipai@canarynpx paperclipai@latestBut it still leaves the most important release questions to a human with a browser:
The repo already has two adjacent pieces:
tests/e2e/onboarding.spec.ts covers the onboarding wizard against the local source treescripts/docker-onboard-smoke.sh boots a published Docker install and auto-bootstraps authenticated mode, but only verifies the API/session layerWhat is missing is one deterministic browser test that joins those two paths.
Add a release-grade Docker-backed browser E2E that validates the published canary and latest installs end to end:
Then wire that test into GitHub Actions so release validation is no longer manual-only.
Turn the current Docker smoke script into a machine-friendly test harness, add a dedicated Playwright release-smoke spec that drives the authenticated browser flow against published Docker installs, and run it in GitHub Actions for both canary and latest.
tests/e2e/onboarding.spec.ts already proves the onboarding wizard can:
That is a good base, but it does not validate the public npm package, Docker path, authenticated login flow, or release dist-tags.
scripts/docker-onboard-smoke.sh already does useful setup work:
Dockerfile.onboard-smokepaperclipai@${PAPERCLIPAI_VERSION} inside Docker/api/companiesThat means the hard bootstrap problem is mostly solved already. The main gap is that the script is human-oriented and never hands control to a browser test.
The repo already has:
.github/workflows/e2e.yml for manual Playwright runs against local source.github/workflows/release.yml for canary publish on master and manual stable promotionSo the right move is to extend the current test/release system, not create a parallel one.
The first version should not require OpenAI, Anthropic, or external agent credentials.
Use the onboarding flow with a deterministic adapter that can run on a stock GitHub runner and inside the published Docker install. The existing process adapter with a trivial command is the right base path for this release gate.
That keeps this test focused on:
Later we can add a second credentialed smoke lane for real model-backed agents.
The current defaults in scripts/docker-onboard-smoke.sh should be treated as stable test fixtures:
[email protected]paperclip-smoke-passwordThe browser test should log in with those exact values unless overridden by env vars.
Keep two lanes:
They overlap on onboarding assertions, but they guard different failure classes.
Refactor scripts/docker-onboard-smoke.sh so it can run in two modes:
Recommended shape:
scripts/docker-onboard-smoke.sh as the public entry pointSMOKE_DETACH=true or --detach mode.env file containing:
SMOKE_BASE_URLSMOKE_ADMIN_EMAILSMOKE_ADMIN_PASSWORDSMOKE_CONTAINER_NAMESMOKE_DATA_DIRThe workflow and Playwright tests can then consume the emitted metadata instead of scraping logs.
The current script always tails logs and then blocks on wait "$LOG_PID". That is convenient for manual smoke testing, but it is the wrong shape for CI orchestration.
Create a second Playwright entry point specifically for published Docker installs, for example:
tests/release-smoke/playwright.config.tstests/release-smoke/docker-auth-onboarding.spec.tsThis suite should not use Playwright webServer, because the app server will already be running inside Docker.
The first release-smoke scenario should validate:
//authprocessThe test should tolerate the run completing quickly. For this reason, the assertion should accept:
queuedrunningsucceededand similarly for issue progression if the issue status changes before the assertion runs.
tests/e2e/onboarding.spec.tsThe local-source test and release-smoke test have different assumptions:
Trying to force both through one spec will make both worse.
Add a workflow dedicated to this surface, ideally reusable:
.github/workflows/release-smoke.ymlRecommended triggers:
workflow_dispatchworkflow_callRecommended inputs:
paperclip_version
canary or latesthost_port
artifact_name
docker logsThis lets us:
release.ymlcanary and latestFirst ship the workflow as manual-only so the harness and test can be stabilized without blocking releases.
After publish_canary succeeds in .github/workflows/release.yml, call the reusable release-smoke workflow with:
paperclip_version=canaryThis proves the just-published public canary really boots and onboards.
After publish_stable succeeds, call the same workflow with:
paperclip_version=latestThis gives us post-publish confirmation that the stable dist-tag is healthy.
Testing latest from npm cannot happen before stable publish, because the package under test does not exist under latest yet. So the latest smoke is a post-publish verification, not a pre-publish gate.
If we later want a true pre-publish stable gate, that should be a separate source-ref or locally built package smoke job.
This workflow is only valuable if failures are fast to debug.
Always capture:
docker logs outputcurl /api/health snapshotWithout that, the test will become a flaky black box and people will stop trusting it.
Files:
scripts/docker-onboard-smoke.shscripts/lib/docker-onboard-smoke.sh or similar helperdoc/DOCKER.mddoc/RELEASING.mdTasks:
Acceptance:
Files:
tests/release-smoke/playwright.config.tstests/release-smoke/docker-auth-onboarding.spec.tspackage.jsonTasks:
test:release-smokeAcceptance:
PAPERCLIPAI_VERSION=canaryPAPERCLIPAI_VERSION=latestFiles:
.github/workflows/release-smoke.ymlTasks:
Acceptance:
canary or latestFiles:
.github/workflows/release.ymldoc/RELEASING.mdTasks:
Acceptance:
latest browser smoke resultNot part of the first implementation, but this should be the next layer after the deterministic lane is stable.
Possible additions:
claude_local or codex_local adapter validation in Docker-capable environmentsThis should stay optional until the token-free lane is trustworthy.
The plan is complete when the implemented system can demonstrate all of the following:
paperclipai@canary Docker install can be smoke-tested by Playwright in CI.paperclipai@latest Docker install can be smoke-tested by Playwright in CI.That is expected. The assertions should prefer API polling for run existence/status rather than only visual indicators.
latest smoke is post-publish, not preventiveThis is a real limitation of testing the published dist-tag itself. It is still valuable, but it should not be confused with a pre-publish gate.
The important contract is flow success, created entities, and run creation. Use visible labels sparingly and prefer stable semantic selectors where possible.
For release safety, the first test should use the most boring runnable adapter possible. This is not the place to validate every adapter.
If we want the fastest path to value, ship this in order:
scripts/docker-onboard-smoke.shrelease-smoke.ymlrelease.ymllatest smoke into release.ymlThat gives release confidence quickly without turning the first version into a large CI redesign.