doc/administration/geo/disaster_recovery/bring_primary_back.md
{{< details >}}
{{< /details >}}
After a failover, it is possible to bring back the demoted primary site as a new secondary site or restore your original primary site. This process consists of two steps:
[!warning]
If you have any doubts about the consistency of the data on this site, you should set it up from scratch.
The demoted primary is considered a standalone GitLab server that is not in sync with Geo anymore.
Make sure that any remnant configuration of it as a former primary is removed prior to adding it back as a new secondary site.
Because the former primary site is out of sync with the current primary site, the first step is to bring the former primary site up to date. Note, deletion of data stored on disk like repositories and uploads is not replayed when bringing the former primary site back into sync, which may result in increased disk usage. Alternatively, you can set up a new secondary GitLab instance to avoid this.
To bring the former primary site up to date:
SSH into the former primary site that has fallen behind.
Remove /etc/gitlab/gitlab-cluster.json if it exists. (What is the gitlab-cluster.json file?)
If the site to be re-added as a secondary site was promoted with the gitlab-ctl geo promote command, then it may contain /etc/gitlab/gitlab-cluster.json. For example during gitlab-ctl reconfigure, you may notice output like:
The 'geo_primary_role' is defined in /etc/gitlab/gitlab-cluster.json as 'true' and overrides the setting in the /etc/gitlab/gitlab.rb
If so, then /etc/gitlab/gitlab-cluster.json must be deleted from every Sidekiq, PostgreSQL, Gitaly, and Rails node in the site (if using multi-node setup), to make /etc/gitlab/gitlab.rb the single source of truth again.
Make sure all the services are up:
sudo gitlab-ctl start
[!note]
- If you disabled the primary site permanently, you need to undo those steps now. For distributions with systemd, such as Debian/Ubuntu/CentOS7+, you must run
sudo systemctl enable gitlab-runsvdir. For distributions without systemd, such as CentOS 6, you need to install the GitLab instance from scratch and set it up as a secondary site by following Setup instructions. In this case, you don't need to follow the next step.- If you changed the DNS records for this site during disaster recovery procedure you may need to block all the writes to this site during this procedure.
Set up Geo. In this case, the secondary site refers to the former primary site.
If PgBouncer was enabled on the current secondary site
(when it was a primary site) disable it by editing /etc/gitlab/gitlab.rb
and running sudo gitlab-ctl reconfigure.
You can then set up database replication on the secondary site.
Configure JWT audience for OpenBao. If you have enabled GitLab Secrets Manager
and the primary and secondary sites don't share the same JWT audience,
set jwt_audience to the new primary's OpenBao URL in the re-added secondary's Helm values:
global:
openbao:
enabled: true
url: https://openbao.old-primary.example.com:8200
jwt_audience: https://openbao.promoted.example.com:8200
If you have lost your original primary site, follow the setup instructions to set up a new secondary site.
When the initial replication is complete and the primary site and secondary site are closely in sync, you can do a planned failover.
If your objective is to have two sites again, you need to bring your secondary site back online as well by repeating the first step (configure the former primary site to be a secondary site) for the secondary site.
If there is more than one secondary site, the remaining sites can be brought online now. For each of the remaining sites, initiate the replication process with the primary site.
When a secondary site is added, if it contains data that would otherwise be synced from the primary, then Geo avoids re-transferring the data.
git fetch, which only transfers missing refs.Use-cases:
{{< history >}}
geo_skip_download_if_exists. Disabled by default.geo_skip_download_if_exists removed.{{< /history >}}
When you add a secondary site which has preexisting blobs data, then the secondary Geo site will avoid re-transferring that data. This applies to:
If the secondary site's copy is actually corrupted, then background verification will eventually fail, and the blob will be resynced.
Blobs will only be skipped in this manner if they do not have a corresponding registry record in the Geo tracking database. The conditions are strict because resyncing is almost always intentional, and we cannot risk mistakenly skipping a transfer.