design/bulkload-restore-integration.md
#Design for Integrating BulkDump / BulkLoad into Backup / Restore
Integrate BulkDump and BulkLoad technologies into FoundationDB's backup and restore systems to enable faster snapshot operations and provide the foundation for future capabilities like restore into live clusters. The integration maintains backward compatibility during the transition period while allowing users to opt into the improved performance characteristics of direct Storage Server coordination.
FoundationDB's current backup and restore system consists of mature, production-proven mechanisms:
Backup V2: Creates backups by generating range files (snapshots) and continuously capturing mutation logs. Both are uploaded to S3 for durable storage.
Restore: Recovers data by reading range files and mutation logs from S3, then applying them to a target cluster via the transaction system.
New Technologies: BulkDump and BulkLoad (introduced in FDB 7.4) provide experimental alternatives that coordinate directly with Storage Servers via the Data Distributor, bypassing transaction system overhead.
Using BulkDump and BulkLoad will help speed up backup and restore. BulkDump and BulkLoad bypass the transaction system and run in parallel enabling faster snapshot and restore. They also facilitate our later objective of being able to load into a live cluster.
Currently, the BulkDump/BulkLoad system supports range-only data handling. Extending it to support range + mutation log restores will require significant additional work and is considered a long-term goal. The additional work includes:
In the near term, this project focuses on integrating BulkDump into the backup pipeline for snapshot generation and BulkLoad into the restore pipeline for range data consumption. This design doc is scoped specifically to that integration effort.
When integrating BulkDump and BulkLoad into the backup and restore snapshot system, two key problems must be addressed in addition to the integration itself:
Given these problems, we establish the following 4 requirements:
Single-Command Integration: When running backup or restore operations, users can choose to enable the new snapshot system (BulkDump/BulkLoad) with existing commands.
Backward Compatibility: Backup data generated with BulkDump can be restored using traditional range file restore methods, and traditional backup data can be restored using the existing range file + mutation log process through the transaction system.
Performance Requirement: Backup and restore times must be no longer than current implementations when using BulkDump/BulkLoad.
Encryption Compatibility: BulkDump and BulkLoad perform encryption when backup/restore encryption is configured.
Requirements 1 and 2 will be inherently satisfied once the design points outlined in the next section are implemented. Requirement 3 also appears feasible, as initial measurements show BulkDump/BulkLoad delivering performance benefits over the transaction-based approach currently used in backup and restore.
This integration represents one half of the evolution of Backup V2 to V3, specifically improving the snapshot system (the other part of V3 is range-partitioned backup mutation logs -- a separate effort). The key innovation here is the replacement of transaction-based snapshot handling with direct Storage Server coordination via BulkDump and BulkLoad (while maintaining backward compatibility during the transition).
graph TB
subgraph FDB_Cluster["FDB Cluster"]
SS1[Storage Server 1]
SS2[Storage Server 2]
SSN[Storage Server N]
CP[Commit Proxy]
TLog[Transaction Logs]
BA["Backup Agents
External TaskBucket processes"]
BW["Backup Workers
NEW in V2
Recruited by CC"]
SS1 -->|Read snapshots| BA
SS2 -->|Read snapshots| BA
SSN -->|Read snapshots| BA
CP -->|Write mutations| TLog
TLog -->|Pull mutations| BW
end
subgraph S3_V2["S3 Bucket - Backup V2"]
RF["kvranges/
Snapshots from Backup Agents"]
ML["logs/
Partitioned logs from Backup Workers"]
end
BA -->|Save snapshots| RF
BW -->|Save partitioned logs| ML
style BW fill:#C8E6C9
style ML fill:#C8E6C9
V2 Key Features:
graph TB
subgraph S3_Backup_V2["S3 Bucket - Backup V2"]
RF2["kvranges/"]
ML2["logs/
Partitioned"]
end
RA["Restore Agents
TaskBucket-based
1. Read ranges
2. Read partitioned logs
3. Merge mutations"]
subgraph Target_Cluster["Target FDB Cluster"]
CP2[Commit Proxy]
SS2[Storage Servers]
end
RF2 -->|Read| RA
ML2 -->|Read| RA
RA -->|Transactions| CP2
CP2 --> SS2
V2 Restore Process:
graph TB
subgraph FDB_Cluster_V3["FDB Cluster"]
SS1[Storage Server 1]
SS2[Storage Server 2]
SSN[Storage Server N]
DD["Data Distributor
Coordinates BulkDump"]
CP[Commit Proxy]
TLog[Transaction Logs]
BA["Backup Agents
V2 TaskBucket
For fallback"]
BW["Backup Workers
V2 unchanged"]
BDTF["BulkDumpTaskFunc
NEW - coordinates SST generation"]
SS1 -->|Read snapshots V2| BA
SS2 -->|Read snapshots V2| BA
SSN -->|Read snapshots V2| BA
SS1 -->|Generate & Upload SSTs V3| DD
SS2 -->|Generate & Upload SSTs V3| DD
SSN -->|Generate & Upload SSTs V3| DD
DD -->|Coordinate| BDTF
CP -->|Write mutations| TLog
TLog -->|Pull mutations| BW
end
subgraph S3_V3["S3 Bucket - Backup V3"]
RF["kvranges/
V2 fallback"]
ML["logs/
V2 partitioned"]
BD["bulkdump_data/
V3 NEW SST files"]
end
BA -->|Save V2 snapshots| RF
BW -->|Save partitioned logs| ML
SS1 -->|Upload SSTs directly| BD
SS2 -->|Upload SSTs directly| BD
SSN -->|Upload SSTs directly| BD
style BDTF fill:#FFF3CD
style DD fill:#FFF3CD
style BD fill:#FFF3CD
V3 Key Changes:
backup_agent = Executable binary that runs as a long-running process to execute TaskBucket tasksBackup Agents = Running instances of the backup_agent executable that perform backup operationsRestore Agents = Running instances of the backup_agent executable that perform restore operations (same processes, different tasks)fdbbackup = Command-line tool that submits backup jobs to TaskBucket (does not execute the backup itself)fdbrestore = Command-line tool that submits restore jobs to TaskBucket (does not execute the restore itself)Flow:
fdbbackup start → Submits backup job to TaskBucketbackup_agent processes pick up and execute the backup tasksfdbrestore start → Submits restore job to TaskBucketbackup_agent processes pick up and execute the restore tasksfdbbackup start --mode <mode> \
-d <backup_url> \
-t <target_version> \
[--timeout <seconds>]
New Parameter:
--mode <mode>: Controls which snapshot mechanism(s) to use
rangefile (default): Generate only traditional range files (V1/V2 method)bulkdump: Generate only BulkDump SST filesboth: Generate both formats (with unique filenames to prevent collision)fdbrestore start \
-r <backup_url> \
-t <target_version> \
--dest-cluster-file <cluster_file> \
[--mode <mode>]
New Parameter:
--mode <mode>: Controls which restore mechanism to use for range data
rangefile (default): Use traditional range file restore from the kvranges/ directory in S3bulkload: Use BulkLoad for range data restoration if BulkDump dataset is availableDefault behavior (traditional range files):
kvranges/ directory in S3bulkdump_data/ directory is present in S3BulkLoad behavior (with --mode bulkload):
bulkdump_data/ directory in S3)s3://bucket/backup-2025-01-20-23-17-10.123456/
├── kvranges/ # V2 snapshot format (fallback compatibility)
│ ├── snapshot.000000000001234567/
│ │ └── 0/
│ │ ├── range,1980422,c5c81efaa67c1b7bb5e17c756f3b2416,1048576
│ │ ├── range,1998818,192536233eafb59e5e854faf1b35d5ca,1048576
│ │ └── ...
│ └── ...
├── snapshots/ # Snapshot metadata
│ └── snapshot,1980422,2025711,570
├── logs/ # V2: Partitioned logs (unchanged from V2)
│ └── 0000/
│ └── 0000/
│ ├── log,1923285,21923285,392f2edb4fa32c2af5171686a6b7f8bb,1048576
│ └── ...
├── properties/ # Backup properties
│ ├── log_begin_version
│ ├── log_end_version
│ └── mutation_log_type
└── bulkdump_data/ # V3 NEW: BulkDump SST format
└── <job-uuid>/ # Job-specific directory
├── job-manifest.txt # Top-level job manifest
├── 0/ # Shard/range directory (shard 0)
│ ├── <version>-manifest.txt # Shard manifest
│ └── <version>-data.sst # Shard SST data file
├── 1/ # Shard/range directory (shard 1)
│ ├── <version>-manifest.txt
│ └── <version>-data.sst
└── ... # Additional shards
V3 Design Notes:
kvranges/ enables fallback to V2 restore method and side-by-side validationbulkdump_data/ provides faster restore via direct SST ingestion using BulkDump's native layout - no conversion or adaptation neededlogs/ format unchanged from V2 (partitioned logs)bulkdump_data/ directory contains SST-formatted snapshot data at the same version as the traditional snapshots in snapshots/ and kvranges/ directories. Each represents the same point-in-time snapshot but in different formats (SST vs range files).bulkdump_data/ will be generated; kvranges/ will be deprecatedTo achieve the requirements, we propose the following designs:
Details:
fdbbackup start --mode both enables BulkDump-based snapshot generation alongside traditional range filesBulkDumpTaskFunc, coordinates with the Data Distributor to have Storage Servers generate SST files directlybulkdump_data/ folder containing SST files alongside traditional kvranges/ folder for fallbackRationale: This replaces the current inefficient process where backup agents read data through transactions and generate range files, providing direct Storage Server coordination for faster snapshot generation.
Details:
fdbrestore start --mode bulkload enables BulkLoad-based range data restoration; default behavior uses traditional range filesBulkLoadTaskFunc, is inserted before normal restore tasks to handle SST ingestion when --mode bulkload is specifiedRationale: This provides an opt-in path to replace the current transaction-based range file consumption with direct SST ingestion for faster range data loading.
Details:
kvranges/ (traditional) and bulkdump_data/ (SST-based) foldersRationale:
Details:
Backup Integration:
BulkDumpTaskFunc treats BulkDump system as black box, delegating coordination to Data DistributorRestore Integration:
BulkLoadTaskFunc treats BulkLoad system as black box, delegating SST ingestion to Data DistributorRationale: This approach preserves simplicity, reduces coupling between systems, and enables independent evolution of BulkDump/BulkLoad vs Backup/Restore systems.
graph TB
subgraph S3_Backup_V3["S3 Bucket - Backup V3"]
RF3["kvranges/
V2 fallback"]
ML3["logs/
V2 partitioned"]
BD3["bulkdump_data/
V3 SST files"]
end
RC["fdbrestore
Submits restore job
to TaskBucket"]
TB["TaskBucket
Orchestrates task
dependencies"]
BLTF["BulkLoadTaskFunc NEW
1. Verify manifest
2. Start BulkLoad"]
BLS["BulkLoad System
Black box
Direct SST ingestion"]
RA3["Backup Agents
TaskBucket execution
Apply mutation logs"]
subgraph Target_Cluster_V3["Target FDB Cluster"]
DD3[Data Distributor]
SS3[Storage Servers]
CP3[Commit Proxy]
end
RC -->|Submit job| TB
TB -->|1. Execute| BLTF
BD3 -->|Read SSTs| BLTF
BLTF -->|Delegate| BLS
BLS -->|Direct injection
Bypass transactions| DD3
DD3 -->|Ingest SSTs| SS3
TB -->|2. After BulkLoad done| RA3
ML3 -->|Read partitioned logs| RA3
RA3 -->|Transactions| CP3
CP3 --> SS3
style BLTF fill:#FFF3CD
style BLS fill:#FFF3CD
style BD3 fill:#FFF3CD
style TB fill:#F8D7DA
V3 Restore Flow:
--mode flag:
--mode bulkload: executes BulkLoadTaskFunc to verify manifest and run BulkLoadDetails:
Rationale: This resilient approach ensures BulkDump completes successfully under normal operational conditions while maintaining the proven reliability patterns of traditional backup systems.
Details:
--mode bulkload is specified, fdbrestore first verifies that a complete BulkDump dataset exists in bulkdump_data/--mode rangefileRationale: This provides early detection of incomplete BulkDump datasets and clear guidance for fallback, fulfilling the backward compatibility requirement.
BulkLoad operations require specific cluster configuration to function correctly. BulkLoad automatically validates these prerequisites and provides clear error messages if configuration is invalid.
Required Server Knobs (ALL Processes):
For BulkLoad Operations:
--knob_shard_encode_location_metadata=1 # Enhanced location metadata with shard IDs
--knob_enable_read_lock_on_range=1 # Exclusive range locking for data integrity
For BulkDump Operations:
#No additional knobs required - works with default configuration
Configuration Steps:
Database Wiggle Requirement:
The knob_shard_encode_location_metadata=1 setting changes how shard location metadata is encoded. Existing shards have metadata written in the old format, so a database wiggle is required to force all shards to rewrite their metadata with the new encoding that includes shard IDs required for BulkLoad operations.
#Trigger database wiggle after cluster restart
fdbcli --exec "configure perpetual_storage_wiggle=1"
#Monitor wiggle completion before using BulkLoad
fdbcli --exec "status details"
Example Process Configuration:
fdbserver --knob_shard_encode_location_metadata=1 \
--knob_enable_read_lock_on_range=1 \
[other standard options]
Automatic Validation:
// Validation performed during BulkLoad submission
if (!SERVER_KNOBS->SHARD_ENCODE_LOCATION_METADATA) {
throw bulkload_invalid_configuration("BulkLoad requires --knob_shard_encode_location_metadata=1. "
"Restart cluster with this knob enabled.");
}
if (!SERVER_KNOBS->ENABLE_READ_LOCK_ON_RANGE) {
throw bulkload_invalid_configuration("BulkLoad requires --knob_enable_read_lock_on_range=1. "
"Restart cluster with this knob enabled.");
}
Important: The knob validation above only checks that knobs are enabled. It does not verify that the database wiggle has completed and all shard metadata is in the new format. If BulkLoad encounters shards with old-format metadata, it will fail at runtime. Operators must ensure the database wiggle has fully completed before using BulkLoad. Monitor wiggle progress via fdbcli --exec "status details" and verify no shards are pending migration.
Approach: Generate only BulkDump SST files and create a converter to traditional range files when needed.
Pros:
Cons:
Decision: Rejected in favor of dual dataset approach for reliability and performance.
Approach: Deprecate traditional backup immediately and require migration period.
Pros:
Cons:
Decision: Rejected in favor of gradual transition with fallback support.
Approach: Integrate BulkLoad directly into existing Backup/Restore task logic.
Pros:
Cons:
Decision: Rejected in favor of black box approach for maintainability.
The implementation is divided into four phases with specific testing criteria:
Implementation:
Completion Criteria: High confidence in correctness under failure conditions for both backup and restore operations.
Implementation:
Completion Criteria: Confidence that the new snapshot system runs correctly with S3.
Implementation:
Completion Criteria: Confidence that Backup/Restore with the new snapshot system works at large scale in cluster environments.
Implementation: Production environment validation (timeline to be determined).
--mode bulkdump, restore with --mode bulkload--mode both, restore testing both --mode bulkload and default methods--mode bulkload fails gracefully when BulkDump data incompleteBackup Integration Trace Events:
TraceEvent("BackupBulkDumpIntegrationStart")
.detail("BackupURL", url)
.detail("SnapshotMode", mode); // bulkdump, rangefile, or both
TraceEvent("BackupDualSnapshotComplete")
.detail("RangeFilesBytes", rangeBytes)
.detail("BulkDumpBytes", bulkBytes); // NEW: for validation comparison
Restore Integration Trace Events:
TraceEvent("RestoreSnapshotMethodSelected")
.detail("Method", method) // bulkload or rangefile
.detail("BulkLoadAvailable", available);
Backup Status Enhancements:
fdbbackup status -d <backup_url>
#New fields in output:
#Snapshot Mode : bulkdump | rangefile | both
#BulkLoad Compatible : yes | no
Restore Status Enhancements:
fdbrestore status
#New fields in output:
#Snapshot Method : bulkload | rangefile
#Snapshot Phase : complete | in_progress | not_started
#Mutation Log Phase : complete | in_progress | not_started
--mode both flag for backup and --mode bulkload flag for restoreNo configuration changes beyond those required by BulkLoad -- see above -- are required for basic functionality. Optional performance optimization knobs available for advanced users.
General Philosophy:
BulkDump/BulkLoad integration uses the same error handling as current backup/restore: continuous retries with warnings until succeeds (or intervention). The system is designed to avoid manual intervention - failures are rare and most issues resolve automatically.
Failure Scenarios & Handling:
BulkDump Fails During Backup (BulkDump Mode)
--mode bulkdump, backup fails completely (no fallback in bulkdump-only mode)backup_bulkdump_failed--mode rangefile (default) or --mode bothIncomplete BulkDump Dataset at Restore
restore_bulkload_dataset_incomplete errorBulkLoad Fails During Restore
--mode bulkload, BulkLoad task fails and logs errorrestore_bulkload_failedkvranges/ (created with --mode both), retry with --mode rangefile using same snapshotError Code Definitions:
// New error codes for BulkLoad integration
error_code_actor restore_bulkload_dataset_incomplete()
error_code_actor restore_bulkload_failed()
error_code_actor backup_bulkdump_timeout()
error_code_actor backup_bulkdump_failed()
error_code_actor bulkload_invalid_configuration() // NEW: Configuration validation
If critical issues are discovered post-deployment:
--mode rangefile)--mode rangefile)Encryption integration is not yet designed. For production use requiring encryption, use traditional backup/restore without --mode bulkdump or BulkLoad.