pkg/util/wal/wal_format.md
A WAL (Write-Ahead Log) segment is a file containing a sequence of records. Each segment is divided into 32KB pages, and records can span multiple pages but never cross segment boundaries. This document describes the binary format of WAL segment files as used in Prometheus TSDB.
┌─────────────────────────────────────────────────────────────────┐
│ WAL SEGMENT FILE │
├─────────────────────────────────────────────────────────────────┤
│ PAGE 0 (32KB) │
├─────────────────────────────────────────────────────────────────┤
│ RECORD 1 │ RECORD 2 │ RECORD 3 │ ... │ PADDING │
├─────────────────────────────────────────────────────────────────┤
│ PAGE 1 (32KB) │
├─────────────────────────────────────────────────────────────────┤
│ RECORD N │ RECORD N+1 │ ... │ │ PADDING │
├─────────────────────────────────────────────────────────────────┤
│ ... │
└─────────────────────────────────────────────────────────────────┘
Every record in a WAL segment follows this structure:
┌─────────────┬─────────────────────────────────────────────────────┐
│ HEADER │ DATA │
│ (7 bytes) │ (variable length) │
└─────────────┴─────────────────────────────────────────────────────┘
Byte 0 Bytes 1-2 Bytes 3-6
┌─────────┬─────────────────┬─────────────────────────────────────┐
│ TYPE │ LENGTH │ CRC32 │
│(1 byte) │ (2 bytes) │ (4 bytes) │
└─────────┴─────────────────┴─────────────────────────────────────┘
Bit: 7 6 5 4 3 2 1 0
┌───┬───┬───┬───┬───┬───┬───┬───┐
│ - │ - │ - │ Z │ S │ T │ T │ T │
└───┴───┴───┴───┴───┴───┴───┴───┘
│ │ │ │ │ └───┴───┴───┘
│ │ │ │ │ └─ Record Type (3 bits)
│ │ │ │ └─ Snappy Compression Flag (1 bit)
│ │ │ └─ Zstd Compression Flag (1 bit)
└───┴───┴─ Unallocated (3 bits)
Record Types:
0 (recPageTerm): Rest of page is empty1 (recFull): Complete record fits in current page2 (recFirst): First fragment of a record spanning multiple pages3 (recMiddle): Middle fragment of a record spanning multiple pages4 (recLast): Final fragment of a record spanning multiple pagesCompression Flags:
Big-endian 16-bit unsigned integer representing the length of the data portion in bytes.
Big-endian 32-bit CRC32 checksum (Castagnoli polynomial) of the data portion only.
When a record is larger than the remaining space in a page, it gets fragmented:
Page N Page N+1
┌─────────────────────────────┐ ┌─────────────────────────────────┐
│ [HEADER] [DATA PART 1] │ │ [HEADER] [DATA PART 2] [HEADER] │
│ Type: recFirst │ │ Type: recLast Type: recFull│
│ Length: 1024 │ │ Length: 512 Length: 256 │
│ CRC: 0x12345678 │ │ CRC: 0x87654321 CRC: 0xABCD │
└─────────────────────────────┘ └─────────────────────────────────┘
recPageTerm record type indicates the rest of the page is emptyThe data portion contains the actual record payload. The format depends on the application using the WAL
When compression is enabled: