turbopack/crates/turbo-persistence/README.md
This crate provides a way to persist key value pairs into a folder and restore them later.
The API only allows a single write transaction at a time, but multiple threads can fill the transaction with (non-conflicting) data concurrently.
When pushing data into the WriteBatch it is already persisted to disk, but only becomes visible after the transaction is committed. On startup left-over uncommitted files on disk are automatically cleaned up.
The architecture is optimized for pushing a lot data to disk in a single transaction, while still allowing for fast random reads.
It supports having multiple key families, which are stored in separate files, but a write batch can contain keys from multiple families. Each key family defines a separate key space. Entries in different key families doesn't influence each other (also not performance-wise).
There is a single CURRENT file which stores the latest committed sequence number.
All other files have a sequence number as file name, e. g. 0000123.sst. All files are immutable once there sequence number is <= the committed sequence number. But they might be deleted when they are superseeded by other committed files.
There are four different file types:
*.sst): These files contain key value pairs.*.blob): These files contain large values.*.del): These files contain a list of sequence numbers of files that should be considered as deleted.*.meta): These files contain metadata about the SST files. They contains the hash range and a AMQF for quick filtering.Therefore there are there value types:
*.sst files.*.blob files.A meta file can contain metadata about multiple SST files. The metadata is stored in a single file to avoid having too many small files.
The SST file contains only data without any header.
n times
An Index block contains n 8 bytes hashes, which specify n - 1 hash ranges (eq hash goes into the prev range, except for the first key). Between these n hashes there are n - 1 2 byte block indicies that point to the block that contains the hash range.
The hashes are sorted.
n is (block size + 1) / 10
A Key block contains n keys, which specify n key value pairs.
Depending on the type field entry has a different format:
The entries are sorted by key hash and key.
TODO: 8 bytes key hash is a bit inefficient for small keys.
The plain value compressed with dynamic compression.
Reading start from the current sequence number and goes downwards.
Writing starts by creating a new WriteBatch. It maintains an atomic counter of the next free sequence number.
The WriteBatch has a thread local buffer that accumulates operations until a certain threshold is reached. Then the buffer is sorted and written to a new SST file (and maybe some blob files).
When the WriteBatch is committed all thread local buffers are merged into a single global buffer and written into new SST files (potentially multiple when threshold is reached).
fsync! The new sequence number is written to the CURRENT file.
After that optimization might take place.
For compaction we compute the "coverage" of the SST files. The coverage is the average number of SST files that need to be touched to figure out that a key is missing. The coverage can be computed by looking at the min_hash and max_hash of the SST files only.
For a single SST file we can compute (max_hash - min_hash) / u64::MAX as the coverage of the SST file. We sum up all these coverages to get the total coverage.
Compaction chooses a few SST files and runs the merge step of merge sort on tham to create a few new SST files with sorted ranges.
Example:
key hash range: | 0 ... u64::MAX |
SST 1: |----------------|
SST 2: |----------------|
SST 3: |-----|
can be compacted into:
key hash range: | 0 ... u64::MAX |
SST 1': |-------|
SST 2': |------|
SST 3': |-----|
The merge operation decreases the total coverage since the new SST files will have a coverage of < 1.
But we need to be careful to insert the SST files in the correct location again, since items in these SST files might be overriden in later SST file and we don't want to change that.
Since SST files that are smaller than the current sequence number are immutable we can't change the files and we can't insert new files at this sequence numbers. Instead we need to insert the new SST after the current sequence number and copy all SST files after the original SST files after them. (Actually we only need to copy SST files with overlapping key hash ranges. And we can hardlink them instead). Later we will write the current sequence number and delete them original and all copied SST files.
We can run multiple merge operations concurrently when the key hash ranges are not overlapping or they are from different key families. The copy operation need to be strictly after all merge operations.
There must not be another SST file with overlapping key hash range between files of a merge operation.
During the merge operation we eliminate duplicate keys. When blob references are eliminated we delete the blob file after the current sequence number was updated.
Since the process might exit unexpectedly, to avoid "forgetting" to delete the SST files we keep track of that in a *.del file. This file contains the sequence number of SST and blob files that should be deleted. We write that file before the current sequence number is updated. On restart we execute the deletes again.
We limit the number of SST files that are merged at once to avoid long compactions.
Full example:
Example:
key hash range: | 0 ... u64::MAX | Family
SST 1: |-| 1
SST 2: |----------------| 1
SST 3: |----------------| 1
SST 4: |-----| 2
SST 5: |-----| 2
SST 6: |-------| 1
SST 7: |-------| 1
SST 8: |--------| 2
SST 9: |--------| 2
CURRENT: 9
Compactions could selects SST 2, 3, 6 and SST 4, 5, 8 for merging (we limited to 3 SST files per merge operation). This also selects SST 7, 9 for copying. The current sequence number is 9.
We merge SST 2, 3, 6 into new SST files 10, 12, 14 and SST 4, 5, 8 into new SST files 11, 13. Both operations are done concurrently so they might choose free sequence numbers in random order. The operation might result in less SST files due to duplicate keys.
After that we copy SST files 7, 9 to new SST files 15, 16.
We write a "del" file at sequence number 17.
After that we write the new current sequence number 17.
Then we delete SST files 2, 3, 6 and 4, 5, 8 and 7, 9. The
SST files 1 stays unchanged.
key hash range: | 0 ... u64::MAX | Family
SST 1: |-| 1
SST 10: |-----| 1
SST 12: |-----| 1
SST 11: |------| 2
SST 14: |-------| 1
SST 13: |-----| 2
SST 15: |-------| 1
SST 16: |--------| 2
DEL 17: (2, 3, 4, 5, 6, 7, 8, 9)
CURRENT: 17
Configuration options for compations are:
CURRENT fileCURRENT file.*.del files and delete the files that are listed in there.*.sst files and memory map them.