showcase/harness/docs/rotation-drill.md
This runbook walks through how to swap the shared password that
showcase-harness uses to check that deploy-result webhooks are really
coming from our own showcase_deploy.yml workflow (spec §4.5).
The service accepts either SHARED_SECRET or SHARED_SECRET_PREV
as a valid signing key (see orchestrator.ts — both are loaded into
the webhookSecrets array). This is what makes a clean, no-downtime
rotation possible.
1. Generate the new secret.
python -c 'import secrets; print(secrets.token_urlsafe(48))'
2. Stage it as the NEW value on Railway.
Set SHARED_SECRET_NEW on the showcase-harness service — a temporary
holding slot. Do NOT yet promote it to SHARED_SECRET.
railway variables --service showcase-harness --set SHARED_SECRET_NEW="<new>"
3. Roll the verifier forward (step A).
First, assert the staged NEW value is actually present — skipping this check turns a fat-fingered step 2 into a silent outage (the verifier would promote an empty string as the new signer).
STAGED=$(railway variables --service showcase-harness --json | jq -r '.SHARED_SECRET_NEW // empty')
if [ -z "$STAGED" ]; then
echo "FATAL: SHARED_SECRET_NEW is empty on showcase-harness; re-run step 2 first" >&2
exit 1
fi
Then on showcase-harness:
SHARED_SECRET → SHARED_SECRET_PREVSHARED_SECRET_NEW → SHARED_SECRETSHARED_SECRET_NEWCURRENT=$(railway variables --service showcase-harness --json | jq -r .SHARED_SECRET)
railway variables --service showcase-harness --set SHARED_SECRET_PREV="$CURRENT"
railway variables --service showcase-harness --set SHARED_SECRET="$STAGED"
# Remove the staging slot. Recent Railway CLI uses `--unset`; older
# versions used `--remove`. Detect CLI capability upfront rather than
# relying on `A || B` — a transient auth/network failure on `--unset`
# would incorrectly fall through to `--remove`, which on a modern CLI
# is itself an unknown-flag error and could mask the real cause.
if railway variables --help 2>&1 | grep -q -- '--unset'; then
UNSET_FLAG=--unset
elif railway variables --help 2>&1 | grep -q -- '--remove'; then
UNSET_FLAG=--remove
else
echo "railway CLI supports neither --unset nor --remove for variables; upgrade CLI" >&2
exit 1
fi
railway variables --service showcase-harness "$UNSET_FLAG" SHARED_SECRET_NEW
Railway will redeploy. Wait for /health to return 200. The verifier
now accepts both old + new signatures.
4. Roll the signer forward (step B).
Update the GH Actions secret SHOWCASE_HARNESS_SHARED_SECRET in the repo
to the new value. From the CI side, this is a single write:
gh secret set SHOWCASE_HARNESS_SHARED_SECRET --repo CopilotKit/CopilotKit --body "<new>"
Trigger a test deploy (e.g. re-run showcase_deploy.yml against a
scratch branch) and confirm the webhook.deploy.accepted log appears
on showcase-harness. If it does, the signer is now using the new key.
5. Close the overlap (step C).
After ≥ 10 minutes — confirmed by zero webhook.deploy.reject {reason=bad-signature} logs in the interim — remove the previous key:
railway variables --service showcase-harness --unset SHARED_SECRET_PREV \
|| railway variables --service showcase-harness --remove SHARED_SECRET_PREV
The service redeploys and from this point forward only the new
SHARED_SECRET is accepted.
/health returns 200 throughout the drill.webhook.deploy.reject {reason=bad-signature}
appear in the logs (except intentionally during a negative test).grep SHARED_SECRET_PREV returns no match in the
service env.If step 4 surfaces signer issues, revert SHOWCASE_HARNESS_SHARED_SECRET in
GH Actions to the old value. The verifier on showcase-harness still accepts
the old key (SHARED_SECRET_PREV), so rolling back the signer requires
no service change.
If step 3 surfaces verifier issues (the SHARED_SECRET_PREV slot is
kept specifically to give us an undo path without having to regenerate
the secret from scratch), set SHARED_SECRET back to the old value and
unset SHARED_SECRET_PREV.
Rotate every 90 days OR immediately on suspicion of compromise. Mark the next rotation date in the team calendar when step 5 completes.