docs/operations/high-availability.md
This runbook describes the current operator-facing behavior for running Bytebase in a high-availability (HA) topology.
ha field on Subscription.helm-charts/bytebase currently renders a single-replica StatefulSet (replicas: 1). It does not provide a multi-replica deployment switch.In other words: the runtime has HA awareness, but operators must still supply the multi-replica deployment mechanics outside the current Helm chart.
Before you keep more than one Bytebase server active at the same time, make sure all of the following are true:
GET /v1/subscription and confirm ha: true.--ha and the same shared metadata database.
PG_URL value instead of the embedded database.--external-url value so Bytebase reports one externalUrl for user access and callbacks.Bytebase tracks live replicas with heartbeats:
replicaCount on GET /v1/actuator/info.If more than one active replica is detected and the license does not enable HA, Bytebase does not permit the HA topology.
The current runtime behavior is to log warnings such as:
multiple replicas detected (<count>) but HA is not enabled in license
When that condition is present, background runners that check the replica limit skip work instead of continuing in an unsupported topology. This includes scheduler and cleaner paths used for task execution, plan checks, schema sync, approvals, and stale-run cleanup.
Use this checklist when enabling or validating HA:
GET /v1/subscription and verify ha is true.GET /v1/actuator/info and record:
versionexternalUrlreplicaCount--ha, points to the same PG_URL, and uses the same external URL.replicaCount stays at 1Check the following:
replicaCount is lower than expected during a restartA replica falls out of the active set after roughly 30 seconds without a heartbeat. A brief drop during restarts or node moves can therefore be expected.
multiple replicas detected ... but HA is not enabled in licenseThis means the deployment topology and license do not match. Resolve it by doing one of the following:
GET /v1/subscription returns ha: true.Stale heartbeat rows are cleaned up separately and do not define the active replica count. Active counting only considers heartbeats from the last 30 seconds.