docs/internals/mvcc/RECOVERY_SEMANTICS.md
This document describes the MVCC recovery and checkpoint behavior as implemented in
core/mvcc/database/mod.rs and core/mvcc/database/checkpoint_state_machine.rs.
The checkpoint model is stop-the-world (blocking checkpoint lock).
Startup decisions use four durable artifacts:
.db).db-wal).db-log)__turso_internal_mvcc_meta(k='persistent_tx_ts_max')The logical-log header (56 bytes) contains format metadata and a CRC chain seed:
magic, version, flags, hdr_len, salt (u64), reserved, hdr_crc32c.
The salt is regenerated on each log truncation; frame CRCs are chained
(crc32c_append(prev_frame_crc, data)) with the initial seed derived from the salt.
persistent_tx_ts_max in __turso_internal_mvcc_metais the durable replay boundary, stored inside the main database file/WAL.
Recovery replays logical-log frames only when commit_ts > persistent_tx_ts_max.
MvStore::bootstrap() runs in this order:
maybe_complete_interrupted_checkpoint()reparse_schema()maybe_recover_logical_log()Any committed WAL state is reconciled before logical-log replay. The replay boundary comes from the metadata row.
Recovery classifies startup state using two checks:
wal.get_max_frame_in_wal() > 0)try_read_header() return? (Valid, Invalid, or NoLog)| Case | Startup artifacts | Recovery behavior |
|---|---|---|
| 1 | WAL has committed frames + log header valid | Complete interrupted checkpoint: backfill WAL into DB, sync DB, truncate WAL. Then run logical-log recovery with metadata cutoff. |
| 2 | WAL has committed frames + log header missing (NoLog) | Fail closed with Corrupt. |
| 3 | WAL has committed frames + log header invalid/torn | Fail closed with Corrupt. |
| 4 | WAL has no committed frames | Truncate/discard WAL tail bytes and continue logical-log recovery. |
| 5 | No WAL + log header invalid/torn | Fail closed with Corrupt. |
| 6 | No WAL + valid header, no frames (size <= LOG_HDR_SIZE) | No replay needed; timestamp state comes from metadata row. |
| 7 | No WAL + empty log (0 bytes / NoLog) | Timestamp state loaded from metadata row if present; no replay. |
Notes:
persistent_tx_ts_max in the same pager transaction.durable_txid_max advances on this transition.SyncMode::Off).SyncMode::Off).WAL truncation is last. Until the DB file and logical-log cleanup are durable, the WAL remains the authoritative recovery source.
commit_ts > persistent_tx_ts_max.persistent_tx_ts_max is advanced atomically with pager commit during checkpoint.max(persistent_tx_ts_max, max_replayed_log_commit_ts) + 1.SyncMode::Off skips fsync calls. This weakens durability but does not change
logical ordering or fail-closed validation behavior.
Key tests in core/mvcc/database/tests.rs:
test_bootstrap_completes_interrupted_checkpoint_with_committed_waltest_bootstrap_rejects_committed_wal_without_log_filetest_bootstrap_rejects_torn_log_header_with_committed_waltest_bootstrap_handles_committed_wal_when_log_truncatedtest_bootstrap_ignores_wal_frames_without_commit_markertest_bootstrap_rejects_corrupt_log_header_without_waltest_empty_log_recovery_loads_checkpoint_watermarktest_meta_checkpoint_case_10_metadata_upsert_is_atomic_with_pager_committest_meta_checkpoint_case_11_auto_checkpoint_failure_after_commit_remains_recoverableLogical-log corruption and torn-tail tests are in core/mvcc/persistent_storage/logical_log.rs.