skills/litestream/SKILL.md
Litestream is a standalone disaster recovery tool for SQLite. It runs as a
background process, monitors the SQLite WAL (Write-Ahead Log), converts changes
to immutable LTX files, and replicates them to cloud storage. It uses
modernc.org/sqlite (pure Go, no CGO required).
# Build
go build -o bin/litestream ./cmd/litestream
# Test (always use race detector)
go test -race -v ./...
# Code quality
pre-commit run --all-files
These invariants must never be violated:
SQLite reserves a page at byte offset 0x40000000 (1 GB). Always skip it during replication and compaction. The page number varies by page size:
| Page Size | Lock Page Number |
|---|---|
| 4 KB | 262145 |
| 8 KB | 131073 |
| 16 KB | 65537 |
| 32 KB | 32769 |
lockPgno := ltx.LockPgno(pageSize)
if pgno == lockPgno {
continue
}
Once an LTX file is written, it must never be modified. New changes create new files. This guarantees point-in-time recovery integrity.
Each database replicates to exactly one destination. The Replica component manages replication mechanics; database state belongs in the DB layer.
Cloud storage is eventually consistent. Always read from local disk first:
f, err := os.Open(db.LTXPath(info.Level, info.MinTXID, info.MaxTXID))
if err == nil {
return f, nil // Use local copy
}
return replica.Client.OpenLTXFile(...) // Fall back to remote
Set the compacted file's CreatedAt to the earliest source file timestamp to
maintain temporal granularity for point-in-time restoration.
info.CreatedAt = oldestSourceFile.CreatedAt
// CORRECT
r.mu.Lock()
defer r.mu.Unlock()
r.pos = pos
// WRONG - race condition
r.mu.RLock()
defer r.mu.RUnlock()
r.pos = pos
Always write to a temp file then rename. Never write directly to the final path.
tmpFile, err := os.CreateTemp(dir, ".tmp-*")
// ... write data, sync ...
os.Rename(tmpFile.Name(), finalPath)
| Layer | File(s) | Responsibility |
|---|---|---|
| App | cmd/litestream/ | CLI commands, YAML/env config |
| Store | store.go | Multi-DB coordination, compaction |
| DB | db.go | Single DB management, WAL monitoring |
| Replica | replica.go | Replication to one destination |
| Storage | */replica_client.go | Backend implementations (S3, GCS, etc.) |
Database state logic belongs in the DB layer, not the Replica layer.
All storage backends implement this interface from replica_client.go:
type ReplicaClient interface {
Type() string
Init(ctx context.Context) error
LTXFiles(ctx context.Context, level int, seek ltx.TXID, useMetadata bool) (ltx.FileIterator, error)
OpenLTXFile(ctx context.Context, level int, minTXID, maxTXID ltx.TXID, offset, size int64) (io.ReadCloser, error)
WriteLTXFile(ctx context.Context, level int, minTXID, maxTXID ltx.TXID, r io.Reader) (*ltx.FileInfo, error)
DeleteLTXFiles(ctx context.Context, a []*ltx.FileInfo) error
DeleteAll(ctx context.Context) error
}
Key contract details:
OpenLTXFile must return os.ErrNotExist when file is missingWriteLTXFile must set CreatedAt from backend metadata or upload timeLTXFiles with useMetadata=true fetches accurate timestamps (for PIT restore)LTXFiles with useMetadata=false uses fast timestamps (normal operations)Always acquire locks in this order to prevent deadlocks:
Store.muDB.muDB.chkMuReplica.muDB (db.go): Manages SQLite connection, WAL monitoring, checkpointing, and
long-running read transaction for consistency. Key fields: path, db, rtx
(read transaction), pageSize, notify channel.
Replica (replica.go): Tracks replication position (ltx.Pos with TXID,
PageNo, Checksum). One replica per database.
Store (store.go): Coordinates multiple databases and schedules compaction
across levels.
LTX (Log Transaction) files are immutable, checksummed archives of database changes. Structure:
+------------------+
| Header | 100 bytes (magic "LTX1", page size, TXID range, timestamp)
+------------------+
| Page Frames | 4-byte pgno + pageSize bytes data, per page
+------------------+
| Page Index | Binary search index for page lookup
+------------------+
| Trailer | 16 bytes (post-apply checksum, file checksum)
+------------------+
Format: MMMMMMMMMMMMMMMM-NNNNNNNNNNNNNNNN.ltx
Example: 0000000000000001-0000000000000064.ltx (TXID 1-100)
Level 0: /ltx/0000/ Raw LTX files (no compaction)
Level 1: /ltx/0001/ Compacted periodically
Level 2: /ltx/0002/ Compacted less frequently
Default compaction levels: L0 (raw), L1 (30s), L2 (5min), L3 (1h), plus daily snapshots. Compaction merges files by deduplicating pages (latest version wins) and always skips the lock page.
fmt.Errorf("context: %w", err) for error wrappingdb.verify() to trigger snapshots (don't reimplement)go test -raceLTXFiles (paginate, don't load all at once)RLock() when writing shared stateos.ErrNotExist for missing filesLoad reference files on demand based on the task:
| Task | Reference File |
|---|---|
| Understanding system design | references/ARCHITECTURE.md |
| Writing or reviewing code | references/PATTERNS.md |
| Working with LTX files | references/LTX_FORMAT.md |
| WAL monitoring or page operations | references/SQLITE_INTERNALS.md |
| Implementing storage backends | references/REPLICA_CLIENT_GUIDE.md |
| Writing or debugging tests | references/TESTING_GUIDE.md |
PRAGMA journal_mode must return waldb.notify channel is being signaled on WAL changesreplica.Pos() should advance with writesos.ErrNotExist from OpenLTXFile (file not replicated yet)ltx.LockPgno(pageSize)continue guard for lock pageCreatedAt timestamps are preserved (earliest source)Store.levelsos.ErrNotExist for missing files (not generic errors)offset/size in OpenLTXFile-race flagLTXError messages - they include context (Op, Path, Level, TXID) and recovery hintslitestream reset <db-path> clears local LTX state and forces fresh snapshot on next sync (database file is not modified)auto-recover: true on the replica config to auto-reset on LTX errors (disabled by default)cmd/litestream/reset.go, replica.go (auto-recover logic), db.go (ResetLocalState)references/PATTERNS.mdgo test -race -v ./...pre-commit run --all-files# Full test suite with race detection
go test -race -v ./...
# Specific areas
go test -race -v -run TestReplica_Sync ./...
go test -race -v -run TestDB_Sync ./...
go test -race -v -run TestStore_CompactDB ./...
# Coverage
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out
Key testing areas:
Run scripts/validate-setup.sh to verify your development environment is
correctly configured for Litestream development.