docs/release-runbook.md
Single source of truth for the six steps that, if broken, block an MCPProxy release. Each section names the SPOF, points at the exact workflow job, lists the secrets or external systems it depends on, and says what to do when it fails.
This runbook is paired with:
docs/releasing.md — branch model (main / next / hotfix) and tag flowdocs/prerelease-builds.md — prerelease artifact distributiondocs/release-notes-generation.md — Claude-powered notes detailspecs/043-linux-package-repos/ — apt/rpm repo spec + quickstart.github/workflows/release.yml — the stable release pipeline.github/workflows/prerelease.yml — the next pipeline.github/workflows/retry-sign-release.yml — SignPath retry| Event | Workflow | Produces |
|---|---|---|
Push to next | prerelease.yml | Signed + notarized prerelease artifacts (no tag) |
Push a vX.Y.Z tag on main | release.yml | GitHub Release, Homebrew bump, apt/rpm republish |
Push a vX.Y.Z-rc* / -beta* tag | release.yml (partial) | GitHub Release only (apt/rpm skipped) |
workflow_dispatch on Retry Sign | retry-sign-release.yml | Re-submit Windows EXE to SignPath, cut release |
Everything below assumes a stable vX.Y.Z tag on main unless stated otherwise.
Goal: DMG and PKG installers pass Gatekeeper on first-launch with no user override (right-click → Open).
Where: release.yml → build job, steps Import Apple Developer ID certificate, Sign macOS binaries, Create DMG installer (macOS), Create PKG installer (macOS), Submit for notarization (macOS). Prerelease runs the same
steps in prerelease.yml.
What we use:
codesign --force --sign "$CERT_IDENTITY" --timestamp --options runtime for
each Mach-O and the DMG.xcrun notarytool submit … --wait then xcrun stapler staple for both the
PKG and the installer DMG.Required secrets (GitHub → Settings → Secrets → production environment):
| Secret | Purpose |
|---|---|
APPLE_DEVELOPER_ID_CERT | base64-encoded .p12 containing the Developer ID Application identity (and ideally the Installer identity too) |
APPLE_DEVELOPER_ID_CERT_PASSWORD | Password for the above .p12 |
APPLE_DEVELOPER_ID_INSTALLER_CERT | Fallback .p12 if the installer identity is separate |
APPLE_DEVELOPER_ID_INSTALLER_CERT_PASSWORD | Password for the installer .p12 |
APPLE_ID_USERNAME | Apple ID email used for notarytool |
APPLE_ID_APP_PASSWORD | App-specific password from appleid.apple.com |
APPLE_TEAM_ID | Apple Developer Team ID (fallback identity) |
Expiry windows to watch:
Recovery:
Import Apple Developer ID certificate step fails, the cert was
rotated or the secret is wrong. Re-export the .p12 with the "Developer ID
Application" and "Developer ID Installer" identities in the same bundle,
base64 -i cert.p12 | pbcopy, and update both *_CERT and *_CERT_PASSWORD
secrets in the production environment.Invalid, fetch the log with
xcrun notarytool log <submission-id> --apple-id … --password …. The most
common cause is an unsigned helper binary inside the bundle (the core binary,
mcpproxy-tray, or a Swift app bundle pulled in by create-pkg.sh). Re-sign
with --options runtime and resubmit.Verify locally after the release is out:
spctl --assess --type exec -vvv /Applications/MCPProxy.app
xcrun stapler validate mcpproxy-*-darwin-*.dmg
codesign --verify --verbose --deep --strict /Applications/MCPProxy.app
Goal: Windows SmartScreen reputation is inherited by each new installer — no "Unknown publisher" Defender prompt for users on recent builds of the app.
Where: release.yml → sign-windows job (matrix: amd64, arm64). The
build job produces dist/mcpproxy-setup-<tag>-<arch>.exe via Inno Setup
(scripts/build-windows-installer.ps1), uploads it as
unsigned-installer-windows-<arch>, and the sign-windows job feeds it to
SignPath with signpath/github-action-submit-signing-request@v1.
External system: SignPath.io organisation
84efd51b-c11c-4a85-82e6-7c3b1157d7ca, project mcpproxy-go, policy
release-signing. Signing is manual-approval with a 1-hour wait
(wait-for-completion-timeout-in-seconds: 3600). Missing the approval window
fails the whole release job.
Required secrets:
| Secret | Purpose |
|---|---|
SIGNPATH_API_TOKEN | SignPath API token for the release policy |
Recovery when signing times out (most common failure):
The release workflow fails in sign-windows because the SignPath approver did
not click "Approve" inside 60 minutes. All other artifacts (macOS DMGs, Linux
tarballs, release notes) are already built. Do not re-tag — use the retry
workflow:
tag: vX.Y.Zrun_id: the failed run's IDretry-sign-release.yml) re-downloads the unsigned EXEs,
resubmits to SignPath, and creates the GitHub Release from the original
artifacts + the freshly signed installers.Watch items:
SIGNPATH_API_TOKEN in the
production environment.release-signing
policy so a single vacation doesn't freeze releases.Decision doc pending (D30-6, GH #45):
Whether to keep SignPath-hosted signing, move to a self-managed EV cert, or
migrate to Azure Trusted Signing is tracked in MCP-7 as
D30-6. The decision doc will land at docs/decisions/windows-signing.md. Until
that lands, keep using the current SignPath flow — rotating to a different
signing backend mid-sprint risks resetting SmartScreen reputation.
Verify locally:
# On a Windows host
Get-AuthenticodeSignature mcpproxy-setup-vX.Y.Z-amd64.exe
# Status should be "Valid", signer should contain "SignPath" or the EV CN
Goal: Each GitHub Release page has categorised, human-readable notes ("New
Features", "Bug Fixes", "Breaking Changes", "Improvements") instead of a raw
commit dump. The notes are also bundled into the DMG (RELEASE_NOTES.md) and
the Windows installer's docs folder.
Where: release.yml → generate-notes job → scripts/generate-release-notes.sh.
Runs in parallel with the build job; its output is consumed by the release
job when assembling the GitHub Release body.
External system: Anthropic Messages API, model
claude-sonnet-4-5-20250929 (override with CLAUDE_MODEL).
Required secrets:
| Secret | Purpose |
|---|---|
ANTHROPIC_API_KEY | Claude API key (console.anthropic.com) |
Cost: ~ $0.01–0.05 per release.
Failure behaviour: Non-blocking. If the API call fails, the script writes the fallback "Release notes could not be generated automatically" stub and the release still ships. See release-notes-generation.md.
Cadence knobs (env vars on generate-notes step):
| Variable | Default | When to change |
|---|---|---|
CLAUDE_MODEL | claude-sonnet-4-5-20250929 | Migrate model on Anthropic deprecation notices |
MAX_TOKENS | 1024 | Raise for big sprints (>100 commits behind) |
MAX_COMMITS | 200 | Lower only if we hit the context limit |
API_TIMEOUT | 30 | Raise on Anthropic incidents |
Recovery:
export ANTHROPIC_API_KEY="..."
./scripts/generate-release-notes.sh vX.Y.Z
gh release edit vX.Y.Z --notes-file release-notes-vX.Y.Z.md
CLAUDE_MODEL env var on the
generate-notes job in release.yml (and keep it in sync in
prerelease.yml if that job ever gains notes generation). Model IDs are in
~/.claude/CLAUDE.md / generate-release-notes.sh.Goal: apt-get install mcpproxy / dnf install mcpproxy work after a
release, backed by apt.mcpproxy.app and rpm.mcpproxy.app — both Cloudflare
R2 buckets with custom domains. Metadata is GPG-signed with the MCPProxy
Packages key.
Where: release.yml → publish-linux-repos job. Runs only on stable
tags — if: startsWith(github.ref, 'refs/tags/v') && !contains(github.ref_name, '-'). Pre-release tags (v0.24.0-rc1, v0.24.0-beta) intentionally skip this
step so repository metadata never publishes unstable versions.
What it does (per spec 043-linux-package-repos):
linux-packages-* artifacts (the .deb and .rpm files
produced by build).RETAIN_N=10 versions.apt-ftparchive (Debian) and createrepo_c
(Fedora), signs Release / repomd.xml with the imported GPG key.apt-get install in debian:stable-slim and
dnf install in fedora:latest containers.External systems:
mcpproxy-apt and mcpproxy-rpm (custom domains
apt.mcpproxy.app / rpm.mcpproxy.app, account-scoped R2 token).MCPProxy Packages <[email protected]>, 5-year expiry). Fingerprint is in the
PACKAGES_GPG_KEY_ID GitHub variable.Required secrets + variables (production environment):
| Name | Kind | Purpose |
|---|---|---|
R2_ACCOUNT_ID | secret | Cloudflare account ID (endpoint URL) |
R2_ACCESS_KEY_ID | secret | R2 token scoped to both buckets |
R2_SECRET_ACCESS_KEY | secret | R2 token secret |
PACKAGES_GPG_PRIVATE_KEY | secret | ASCII-armored private key + metadata header |
PACKAGES_GPG_PASSPHRASE | secret | Passphrase (stored in 1Password too) |
PACKAGES_GPG_KEY_ID | variable | GPG key fingerprint (selects the right key at sign time) |
One-time setup (generate key, create buckets, bind custom domains, upload
public key) is in
specs/043-linux-package-repos/quickstart.md.
Recovery matrix:
| Symptom | Likely cause | Fix |
|---|---|---|
gpg: signing failed: No secret key | PACKAGES_GPG_KEY_ID variable out of sync | Reset with gh variable set PACKAGES_GPG_KEY_ID --body "<FPR>" |
InvalidAccessKeyId from S3 client | R2 token rotated/expired | Mint new R2 token (scoped to both buckets), update both secrets |
Release file is not valid | Partial sync; bucket in mid-upload state | Re-run the job. publish.sh is idempotent. |
Smoke test fails on apt-get install mcpproxy | Metadata signed but propagated to edge slowly | Wait 5 min (Cloudflare cache) and re-run smoke test; if still failing, inspect R2 object listing |
| Customers report "repository not signed" after rotation | Public key on R2 not refreshed | Re-upload mcpproxy.gpg / .gpg.asc per quickstart step 5 |
Key rotation (annual, or on suspected leak):
Follow the script embedded at the top of ~/repos/PACKAGES_GPG_PRIVATE_KEY.txt
(steps also in the quickstart doc). After rotation, tag a patch release so the
next CI run republishes metadata signed with the new key, then announce the
rotation on the install page.
Goal: brew install smart-mcp-proxy/mcpproxy/mcpproxy (formula) and
brew install --cask smart-mcp-proxy/mcpproxy/mcpproxy (cask, for the DMG
installer) resolve to the latest tag within minutes of a stable release.
Where: release.yml → update-homebrew job, needs: release. Fires only
on startsWith(github.ref, 'refs/tags/v').
External system: tap repository
smart-mcp-proxy/homebrew-mcpproxy.
The job checks it out, downloads each platform tarball + installer DMG from the
fresh GitHub Release, calculates SHA256s, regenerates Formula/mcpproxy.rb,
patches Casks/mcpproxy.rb in place, commits, pushes.
Required secrets:
| Secret | Purpose |
|---|---|
HOMEBREW_TAP_TOKEN | Fine-scoped PAT with contents:write on the tap repo |
Retention: The tap carries only one version (Homebrew expects "latest"), so no purge logic is needed.
Common failures:
curl -fsSL … -o mcpproxy-<version>-darwin-arm64.tar.gz returns 404. GitHub
Release assets propagate a few seconds after the release is published; the
job already sleeps 15s and retries 5× with 10s backoff. If it still fails,
asset upload from the release job is incomplete — inspect the earlier job
first; re-running update-homebrew alone will succeed once assets are there.git push rejected: the token expired or someone force-pushed to the tap.
Regenerate the PAT (smart-mcp-proxy/homebrew-mcpproxy → Settings → PATs →
or org-level fine-scoped PAT), update HOMEBREW_TAP_TOKEN.Casks/mcpproxy.rb is missing on the tap. Restore it from git history —
the formula bump is not a replacement for the cask bump.Local sanity check:
brew update
brew info smart-mcp-proxy/mcpproxy/mcpproxy # expect new version
brew info --cask smart-mcp-proxy/mcpproxy/mcpproxy
brew install smart-mcp-proxy/mcpproxy/mcpproxy # ensure SHAs resolve
next branch hygiene (prerelease pipeline)Goal: Every push to next produces a fully signed, notarized set of
prerelease artifacts (DMG, Windows installer, Linux tarballs) with versions
like v0.24.0-next.5b63e2d. This lets us dogfood sprint work without
tagging main.
Where: prerelease.yml — triggered on push: branches: [next]. Mirrors
the macOS signing + notarization and Windows signing legs of release.yml so
the stable release path is never the first place a signing regression lands.
What next does and does NOT do:
| Action | main (release.yml) | next (prerelease.yml) |
|---|---|---|
| Build all platform artifacts | ✅ | ✅ |
| Sign + notarize macOS DMG / PKG | ✅ | ✅ |
| Sign Windows installer via SignPath | ✅ | ✅ |
| Publish GitHub Release | ✅ | ❌ (artifacts only) |
| Bump Homebrew tap | ✅ | ❌ |
| Publish to apt/rpm repos | ✅ (stable only) | ❌ |
| Deploy Docusaurus docs | ✅ | ❌ |
| Trigger marketing-site version bump | ✅ | ❌ |
| Publish to the MCP Registry | ✅ | ❌ |
Branching rules (see docs/releasing.md):
next.next accumulates the sprint until we cut a release from main.vX.Y.Z tag, land in main, tag
vX.Y.Z+1, and immediately merge the hotfix back into next. Skipping
the backport silently reintroduces the bug on the next release cut.main only receives hotfixes and vetted releases from next. Never push
feature commits directly to main.Common regressions caught on next (keep it that way):
MACOSX_DEPLOYMENT_TARGET=13.0);
a bump here breaks codesign + notarization in subtle ways, and next
catches it.When prerelease breaks but stable just shipped clean:
The prerelease pipeline shares secrets with stable. If next fails while a
recent stable release succeeded, the delta is code — not infra:
git log --oneline main..next to see what landed since the last release.prerelease.yml vs release.yml — they should stay structurally
similar; if one of them was edited in isolation, rebase the fix.next → main until the prerelease build is green.Weekly hygiene:
git log --oneline main..next should not grow silently beyond one sprint.
If it does, cut a release or prune stale branches — stale next makes the
eventual merge high-risk.main. A red next build is the earliest warning
that a stable release will fail too.Bundled for reference; full detail in docs/releasing.md.
# From a clean main that matches next
git checkout main
git pull
git tag -a vX.Y.Z -m "Release vX.Y.Z"
git push origin vX.Y.Z
# Watch https://github.com/smart-mcp-proxy/mcpproxy-go/actions
Expected green jobs on tag push:
generate-notes (Claude)build (all platforms)sign-windows (SignPath — approve within 60 min)release (GitHub Release + DMGs + EXEs + tarballs)update-homebrew (tap bump)publish-linux-repos (apt + rpm to R2)deploy-docs (Docusaurus → Cloudflare Pages, non-blocking)trigger-marketing-update (non-blocking)mcp-registry (non-blocking)If any of 1–6 go red, consult the matching SPOF section above. 7–9 are marked
continue-on-error: true — they do not block the release but should be
back-filled the same week.