design/backup.md
This implementation of BackupAgentBase implements mechanism to backup keyspace into file. File could be on the local
file system or a remote blob store. Basic idea is backup task would record KV ranges and mutation logs into a
container(directory!) to multiple files. Restore would read files from a container and apply KV Ranges and then
mutation logs.
Just like any thing else in FoundationDB, Backup-Restore also depend on monotonically increasing versions. Simplest way to describe backup would be record KVRanges at a version v0 and record mutation logs from v1 to vn. That gives us the capability to restore to any version between v0 and vn. Restore to version vk (v0 <= vk <= vn) will be done by
There is a small flaw in the above simplistic design, versions in FoundationDB are short lived (expiring in 5 seconds). There is no way to take snapshot. There is no way to record KV Ranges for the complete key space at a given version. For a keyspace a-z, its not possible to record KV range (a-z, v0), if keyspace a-z is not small enough. Instead, we can record KV ranges {(a-b, v0), (c-d, v1), (e-f, v2) ... (y-z, v10)}. With mutation log recorded all along, we can still use the simple backup-restore scheme described above on sub keyspaces separately. Assuming we did record mutation log from v0 to vn, that allows us to restore
But, we are not interested in restoring sub keyspaces, we want to restore a-z. Well, we can restore a-z, to any version between v10 and vn by restoring individual sub spaces separately.
Backing up KV ranges and restoring them just uses standard client API, tr->getRange() to read KV ranges and tr->set() to restore them.
For a backup enabled cluster, proxy keeps a copy of all mutations in system keyspace for backup task to pick them up and record. During restore, restore task puts the mutations in system keyspace, then proxy reads them and applies to the database.
As we discussed above KV ranges are for subspaces, where as mutation log is combined for the full keyspace. If we take the same example
Restoring to version vk (v10 < vk <= vn), needs KV ranges to be restored first and then replaying mutation logs. For each KV range (k1-k2, vx) that is restored we need to replay mutation log [(k1-k2, vx+1), .., (k1-k2, vk)]. But, this needs scanning complete mutation log to get mutations for k1-k2, that is sub-optimal, for any decent sized database this will take forever.
Instead looking at restore on key space, we can replay events on version space, that way we need to scan mutation log only once. At each version vx,
For the above example, it would look like
Even though, in the above description logging mutations is shown as continuous task, this is actually divided into two logical parts. During KV ranges are being backed up, we start a mutation log backup in parallel. Until, we complete KV range backup and the parallel mutation logs backup, cluster is not in restorable phase. Once these two tasks are completed, now cluster can be restored back to the last version. To be able to restore even after the KV range backup, we continue to backup mutation logs called differential log backup. With differential backup, we can restore to any version since KV range backup completed. It would look like below
Backup Too large x x
Diff backup ---------------- -----------------
Mutations -------- ---------
KV Range -------- ---------
Start x x
Version space ---------------------------------------------------------------
a b c d e f g h
To explain sequence of events here
| Version | Event |
|---|---|
| a | Asked to start backup |
| b | Started backing up KV ranges and mutation logs |
| c | Completed backing up KV ranges and mutation logs also mark backup restorable |
| c+1 | Started differential backup of logs |
| d | Decided backup is too large already and discontinued differential backup |
| e | Asked to start backup |
| f | Started backing up KV ranges and mutation logs |
| g | Completed backing up KV ranges and mutation logs also mark backup restorable |
| g+1 | Started differential backup of logs |
| h | Decided backup is too large already and discontinued differential backup |
With the above backup scenario, we could restore to any version from c to d or g to h. No other versions are restorable.
Instead of going through the pain of monitoring for backup becoming too large and restarting backup, we could just have backup running continuously with KVrange task restarting once in a while (with some policy of course). The code for continuous has already committed. The document will be added in the future.
FoundationDB implements checksum verification for S3 backup operations to ensure data integrity during upload and download. When BLOBSTORE_ENABLE_OBJECT_INTEGRITY_CHECK is enabled (default), the system uses SHA256 checksums for stronger cryptographic verification compared to MD5.
Key Features:
Important AWS S3 Limitation: Range requests (used by backup restore operations) cannot use S3 checksum verification due to AWS limitations. However, FoundationDB's internal checksums still provide data integrity protection.
For detailed information about S3 checksumming implementation, troubleshooting, and best practices, see design/s3-checksumming.md.