design/s3-checksumming.md
FoundationDB implements comprehensive checksum verification for S3 operations to ensure data integrity during upload, download, and backup operations. This document describes the checksumming strategies and implementation details.
FoundationDB has a layered S3 implementation:
S3BlobStore (Foundation): Low-level S3 REST API implementation
enable_object_integrity_check is enabled)Content-MD5 or x-amz-checksum-sha256) for S3 server-side verificationAsyncFileS3BlobStore (Backup Layer): IAsyncFile interface wrapper
S3Client (Utility Layer): Command-line and bulk operations
S3BlobStoreEndpointChecksum Usage by Layer:
SHA256 Flow (when enable_object_integrity_check is enabled):
Upload Flow:
1. Calculate SHA256 checksum of data
2. Send x-amz-checksum-algorithm: SHA256 header
3. Send x-amz-checksum-sha256: <base64-encoded-checksum> header
4. AWS S3 verifies checksum server-side
5. For multipart uploads: Include ChecksumSHA256 in completion XML
Download Flow:
1. Small files: Use x-amz-checksum-mode: ENABLED for S3 verification
2. Large files: Use custom XXH64 checksums stored in tags/companion files
3. Range requests: Cannot use S3 checksums (AWS limitation)
MD5 Flow (default, when enable_object_integrity_check is disabled):
Upload Flow:
1. Calculate MD5 checksum of data
2. Send Content-MD5: <base64-encoded-checksum> header
3. AWS S3 verifies checksum server-side
4. For multipart uploads: Include MD5 in completion XML
Download Flow:
1. Small files: ETag comparison (limited protection)
2. Large files: Use custom XXH64 checksums stored in tags/companion files
3. Range requests: No checksum verification available
AsyncFileS3BlobStoreRead::read (Range requests)
S3BlobStoreEndpoint::readEntireFile (Small files)
x-amz-checksum-mode: ENABLEDS3Client::copyDownFile (Large files)
AWS S3 Design: The x-amz-checksum-mode: ENABLED header only works for full object downloads because:
Range: bytes=0-1023) inherently cannot support thisIntegrity Protection: While AsyncFileS3BlobStoreRead::read() cannot use S3 transport-level checksums, the backup/restore system provides integrity protection through:
Application-level integrity checks in backup file formats:
decodeRangeFileBlock() validates file headers and versionsrestore_corrupted_data(), restore_corrupted_data_padding(), restore_unsupported_file_version()Full-file checksum verification for small files:
S3BlobStoreEndpoint::readEntireFile() uses x-amz-checksum-mode: ENABLEDenable_object_integrity_check is enabledchecksum_failed() if verification failsProtocol version validation and encryption header validation for encrypted backups
Error detection and retry logic for different failure modes
Key Insight: The ChecksumSHA256 returned in multipart completion response is a composite checksum:
// Enabled by default for stronger security
init(BLOBSTORE_ENABLE_OBJECT_INTEGRITY_CHECK, true);
Missing Checksum in Completion:
InvalidRequest: The complete request must include the checksum for each part
Solution: Include <ChecksumSHA256> tags in completion XML
Checksum Mismatch:
InvalidDigest: The Content-MD5 you specified was invalid
Solution: Verify checksum calculation and encoding
Heap Corruption:
x-amz-checksum-sha256 in part upload responses<ChecksumSHA256> tags