docs/server/operations/backup.md
Backing up a KurrentDB database is straightforward but relies on carrying out the steps below in the correct order.
There are two main ways to perform backups:
Disk snapshots: If your infrastructure is virtualized, disk snapshots are an option and the easiest way to perform backup and restore operations.
Regular file copy:
Backing up one node is recommended. However, ensure that the node you choose to back up is up to date and connected to the cluster.
For additional safety, you can also back up at least a quorum of nodes.
Do not attempt to back up a node using file copy at the same time a scavenge operation is running. Your backup script should stop any ongoing scavenge on a node before taking a backup.
Read-only replica nodes may be used as a backup source.
When either running a backup or restoring, do not mix backup files of different nodes.
A restore must happen on a stopped node.
The restore process can happen on any node of a cluster.
You can restore any number of nodes in a cluster from the same backup source. It means, for example, in the event of a non-recoverable three node cluster, that the same backup source can be used to restore a completely new three node cluster.
When you restore a node that was the backup source, perform a full backup after recovery.
By default, there are two directories containing data that needs to be included in the backup:
db/ where the data is locatedindex/ where the indexes are keptThe exact name and location are dependent on your configuration.
db/ contains:
chk-X.Y where X is the chunk number and Y the version.*.chk (chaser.chk, epoch.chk, proposal.chk, truncate.chk, writer.chk)kurrent.ddb and kurrent.ddb.walindex/ contains:
indexmap5a1a8395-94ee-40c1-bf93-aa757b5887f8.bloomfilter suffix, e.g. 5a1a8395-94ee-40c1-bf93-aa757b5887f8.bloomfilterscavenging/scavenging.dbstream-existence/streamExistenceFilter.chk and stream-existence/streamExistenceFilter.datIf the db/ and index/ directories are on the same volume, a snapshot of that volume is enough.
However, if they are on different volumes, take first a snapshot of the volume containing the index/
directory and then a snapshot of the volume containing the db/ directory.
::: warning Online backup using file copy is not supported when using secondary indexes due to the risk of inconsistent backups. It is recommended to use volume snapshots instead, or take the node offline before copying the files. Refer to the secondary indexes backup and restore section for more details. :::
<index directory>/**/*.chk) to your backup location.<index directory>, excluding the checkpoints).*.chk) to your backup location.chunk-X.Y) to your backup location.For example, with a database in data and index in data/index:
rsync -aIR data/./index/**/*.chk backup
rsync -aI --exclude '*.chk' data/index backup
rsync -aI data/*.chk backup
rsync -a data/*.0* backup
chaser.chk and call it truncate.chk. This effectively overwrites the
restored truncate.chk.The following procedure is designed to minimize the backup storage space, and can be used to do a full and differential backup.
::: warning Online backup using file copy is not supported when using secondary indexes due to the risk of inconsistent backups. It is recommended to use volume snapshots instead, or take the node offline before copying the files. Refer to the secondary indexes backup and restore section for more details. :::
First backup the index:
index/indexmap file to the backup. If the source file does not exist, repeat until it does.indexFiles of all the index/<GUID> and index/<GUID>.bloomfilter files in the source.indexFiles to the backup, skipping file names already in the backup.indexmap file in the source and the backup. If they are different (i.e. the
indexmap file has changed since step 2 or no longer exists), go back to step 2.index/<GUID> and index/<GUID>.bloomfilter files from the backup that are not listed
in indexFiles.index/stream-existence/streamExistenceFilter.chk file (if present) to the backup.index/stream-existence/streamExistenceFilter.dat file (if present) to the backup.index/scavenging/scavenging.db file (if present) to the backup. It should be the only file in the scavenging directory.Then backup the log:
.old suffix. e.g. rename chunk-000123.000000
to chunk-000123.000000.old.chaser.chk to the backup.epoch.chk to the backup.writer.chk to the backup.proposal.chk to the backup.chunkFiles of all chunk files (chunk-X.Y) in the source.chunkFiles to the backup, skipping file names already in the backup. All files
should copy successfully. None should have been deleted since scavenge is not running.chunksFiles list. This will include the .old
file from step 1.chaser.chk and call it truncate.chk.It is extremely important to stop any ongoing scavenge before taking a backup. If this step is not followed, the backed up data may have missing or corrupted files. Your backup script can include the following steps to ensure that any ongoing scavenge is stopped before a node is backed up:
GET request to /admin/scavenge/current or /admin/scavenge/last to determine if there is an ongoing scavenge on the node.DELETE request to /admin/scavenge/{scavengeId} to abort the scavenge. {scavengeId} can be obtained from the HTTP response received in the first step.POST request to /auto-scavenge/pause. This will ensure that no new scavenges are launched on the node while it is being backed up.POST request to /auto-scavenge/resume.POST request to /admin/scavenge. If you are using the auto-scavenge feature, it will resume the scavenge automatically.Increase the cluster size from 3 to 5 to keep further copies of data. This increase in the cluster size will slow the cluster's writing performance as two follower nodes will need to confirm each write.
Alternatively, you can use a read-only replica node, which is not a part of the cluster. In this case, the write performance will be minimally impacted.
Set up a durable subscription that writes all events to another storage mechanism such as a key/value or column store. These methods would require a manual set up for restoring a cluster node or group.
Use a second KurrentDB cluster as a backup. Such a strategy is known as a primary/secondary back up scheme.
The primary cluster asynchronously pushes data to the second cluster using a durable subscription. The second cluster is available in case of a disaster on the primary cluster.
If you are using this strategy, we recommend you only support manual fail over from primary to secondary as automated strategies risk causing a split brain problem.