doc/administration/backup_restore/backup_archive_process.md
When you run the backup command, a backup script creates a backup archive file to store your GitLab data.
To create the archive file, the backup script:
tar file.To back up the database, the db sub-task:
pg_dump to create an SQL dump.pg_dump through gzip and creates a compressed SQL file.To back up Git repositories, the repositories sub-task:
Informs gitaly-backup which repositories to back up.
Runs gitaly-backup to:
Streams the collected data into a directory structure in the backup staging directory.
The following diagram illustrates the process:
%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
accTitle: Git repository backup workflow
accDescr: Sequence diagram showing the repositories sub-task calling gitaly-backup with a list of repositories. For each repository, gitaly-backup uses RPCs to collect refs, create a bundle, and retrieve custom hooks. It then returns success or failure.
box Backup host
participant Repositories sub-task
participant gitaly-backup
end
Repositories sub-task->>+gitaly-backup: List of repositories
loop Each repository
gitaly-backup->>+Gitaly: ListRefs request
Gitaly->>-gitaly-backup: List of Git references
gitaly-backup->>+Gitaly: CreateBundleFromRefList request
Gitaly->>-gitaly-backup: Git bundle file
gitaly-backup->>+Gitaly: GetCustomHooks request
Gitaly->>-gitaly-backup: Custom hooks archive
end
gitaly-backup->>-Repositories sub-task: Success/failure
Gitaly Cluster (Praefect) configured storages are backed up in the same way as standalone Gitaly instances.
gitaly-backup, it rebuilds its own database.
Server-side repository backups are an efficient way to back up Git repositories. The advantages of this method are:
To back up Gitaly on the server-side, the repositories sub-task:
gitaly-backup to make a single RPC call for each repository.The following diagram illustrates the process:
%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
accTitle: Server-side repository backup workflow
accDescr: Sequence diagram showing server-side backups where the repositories sub-task calls gitaly-backup, which issues a BackupRepository request for each repository. Gitaly uploads files directly to object storage, then reports success or failure for that repository.
box Backup host
participant Repositories sub-task
participant gitaly-backup
end
Repositories sub-task->>+gitaly-backup: List of repositories
loop Each repository
gitaly-backup->>+Gitaly: BackupRepository request
Gitaly->>+Object-storage: Git references file
Object-storage->>-Gitaly: Success/failure
Gitaly->>+Object-storage: Git bundle file
Object-storage->>-Gitaly: Success/failure
Gitaly->>+Object-storage: Custom hooks archive
Object-storage->>-Gitaly: Success/failure
Gitaly->>+Object-storage: Backup manifest file
Object-storage->>-Gitaly: Success/failure
Gitaly->>-gitaly-backup: Success/failure
end
gitaly-backup->>-Repositories sub-task: Success/failure
The following sub-tasks back up files:
uploads: Attachmentsbuilds: CI/CD job output logsartifacts: CI/CD job artifactspages: Page contentlfs: LFS objectsterraform_state: Terraform statesregistry: Container registry imagespackages: Packagesci_secure_files: Project-level secure filesexternal_diffs: Merge request diffs (when stored externally)Each sub-task identifies a set of files in a task-specific directory and:
tar utility.gzip without saving to disk.tar file to the backup staging directory.Because backups are created from live instances, files might be modified during the backup process.
In this case, an alternate strategy can be used to back up files. The rsync utility creates a copy of the
files to back up and passes them to tar for archiving.
[!note] If you are using this strategy, the machine running the backup Rake task must have sufficient storage for both the copied files and the compressed archive.
Backup IDs are unique identifiers for backup archives. These IDs are crucial when you need to restore GitLab, and multiple backup archives are available.
Backup archives are saved in a directory specified by the backup_path setting in the config/gitlab.yml file.
The default location is /var/opt/gitlab/backups.
The backup ID is composed of:
YYYY_MM_DD)The following is an example backup ID: 1493107454_2018_04_25_10.6.4-ce
By default, the filename follows the <backup-id>_gitlab_backup.tar structure. For example, 1493107454_2018_04_25_10.6.4-ce_gitlab_backup.tar.
The backup information file, backup_information.yml, saves all the backup inputs that are not included
in the backup. The file is saved in the backup staging directory.
Sub-tasks use this file to determine how to restore and link data in the backup with external
services like server-side repository backups.
The backup information file includes the following:
The backup staging directory is a temporary storage location used during the backup and restore processes. This directory:
The backup staging directory is the same directory where completed backup archives are created. When creating an untarred backup, the backup artifacts remain in this directory, and no archive is created.
The following is an example of a backup staging directory that contains an untarred backup:
backups/
├── 1701728344_2023_12_04_16.7.0-pre_gitlab_backup.tar
├── 1701728447_2023_12_04_16.7.0-pre_gitlab_backup.tar
├── artifacts.tar.gz
├── backup_information.yml
├── builds.tar.gz
├── ci_secure_files.tar.gz
├── db
│ ├── ci_database.sql.gz
│ └── database.sql.gz
├── lfs.tar.gz
├── packages.tar.gz
├── pages.tar.gz
├── repositories
│ ├── manifests/
│ ├── @hashed/
│ └── @snippets/
├── terraform_state.tar.gz
└── uploads.tar.gz