tsdb/docs/format/chunks.md
The following describes the format of a chunks file,
which is created in the chunks/ directory of a block.
The maximum size per segment file is 512MiB.
Chunks in the files are referenced from the index by uint64 composed of in-file offset (lower 4 bytes) and segment sequence number (upper 4 bytes).
┌──────────────────────────────┐
│ magic(0x85BD40DD) <4 byte> │
├──────────────────────────────┤
│ version(1) <1 byte> │
├──────────────────────────────┤
│ padding(0) <3 byte> │
├──────────────────────────────┤
│ ┌──────────────────────────┐ │
│ │ Chunk 1 │ │
│ ├──────────────────────────┤ │
│ │ ... │ │
│ ├──────────────────────────┤ │
│ │ Chunk N │ │
│ └──────────────────────────┘ │
└──────────────────────────────┘
┌───────────────┬───────────────────┬─────────────┬───────────────────┐
│ len <uvarint> │ encoding <1 byte> │ data <data> │ checksum <4 byte> │
└───────────────┴───────────────────┴─────────────┴───────────────────┘
Notes:
len: Chunk size in bytes. 1 to 5 bytes long using the <uvarint> encoding.encoding: Currently either XOR, histogram, or floathistogram, see code for numerical values.data: See below for each encoding.checksum: Checksum of encoding and data. It's a cyclic redundancy check with the Castagnoli polynomial, serialised as an unsigned 32 bits big endian number. Can be referred as a CRC-32C.┌──────────────────────┬───────────────┬───────────────┬──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┬─────┬──────────────────────┬──────────────────────┬──────────────────┐
│ num_samples <uint16> │ ts_0 <varint> │ v_0 <float64> │ ts_1_delta <uvarint> │ v_1_xor <varbit_xor> │ ts_2_dod <varbit_ts> │ v_2_xor <varbit_xor> │ ... │ ts_n_dod <varbit_ts> │ v_n_xor <varbit_xor> │ padding <x bits> │
└──────────────────────┴───────────────┴───────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴─────┴──────────────────────┴──────────────────────┴──────────────────┘
ts is the timestamp, v is the value.... means to repeat the previous two fields as needed, with n starting at 2 and going up to num_samples – 1.<uint16> has 2 bytes in big-endian order.<varint> and <uvarint> have 1 to 10 bytes each.ts_1_delta is ts_1 – ts_0.ts_n_dod is the “delta of deltas” of timestamps, i.e. (ts_n – ts_n-1) – (ts_n-1 – ts_n-2).v_n_xor is the result of v_n XOR v_n-1.<varbit_xor> is a specific variable bitwidth encoding of the result of XORing the current and the previous value. It has between 1 bit and 77 bits.
See code for details.<varbit_ts> is a specific variable bitwidth encoding for the “delta of deltas” of timestamps (signed integers that are ideally small).
It has between 1 and 68 bits.
see code for details.padding of 0 to 7 bits so that the whole chunk data is byte-aligned.ts_1, v_1, etc. are optional.XOR2 uses the same structure as XOR for samples 0 and 1. Starting from sample 2, a joint control prefix encodes both the timestamp delta-of-delta (dod) and whether the value changed, with common dod cases byte-aligned for efficient writing.
XOR2 can encode start timestamp (ST) as well optionally, see details further down.
┌──────────────────────┬───────────────────┬───────────────┬───────────────┬────────────────┬─-
│ num_samples <uint16> │ st_header <uint8> | ts_0 <varint> │ v_0 <float64> │ ?st_0 <varint> |
└──────────────────────┴───────────────────┴───────────────┴───────────────┴────────────────┴─-
-─────────────────────┬───────────────────────┬─────────────────────────┬─-
ts_1_delta <uvarint> │ v_1_xor <varbit_xor2> │ ?st_1_delta <varbit_ts> |
-─────────────────────┴───────────────────────┴─────────────────────────┴─-
-─────────────────────────┬───────────────────────┬─────┬─-
sample_2 <joint_sample2> │ ?st_2_dod <varbit_ts> | ... │
-─────────────────────────┴───────────────────────┴─────┴─-
-─────────────────────────┬───────────────────────┬──────────────────┐
sample_n <joint_sample2> │ ?st_n_dod <varbit_ts> | padding <x bits> │
-─────────────────────────┴───────────────────────┴──────────────────┘
<joint_sample2>):Each sample starts with a variable-length control prefix that jointly encodes the dod and value change status:
| Control prefix | dod | Value encoding that follows |
|---|---|---|
0 | 0 | (none, value unchanged) |
10 | 0 | <varbit_xor2_nn> (value known non-zero and non-stale) |
110DDDDD DDDDDDDD | 13-bit signed [-4096, 4095] | <varbit_xor2> |
1110DDDD DDDDDDDD DDDDDDDD | 20-bit signed [-524288, 524287] | <varbit_xor2> |
11110 + 64-bit dod | exact | <varbit_xor2> |
11111 | 0 | (none, stale NaN — no value field) |
The 110 and 1110 cases pack the prefix and the most-significant dod bits into
the first byte, making the full dod field byte-aligned.
<varbit_xor2>):Used after the dod≠0 control prefixes. The XOR of the current and previous value is encoded as:
| Prefix | Meaning |
|---|---|
0 | XOR = 0 (value unchanged) |
10 | Reuse previous leading/trailing window; sigbits value bits follow |
110 + leading(5) + sigbits(6) + value(sigbits) | New leading/trailing window |
111 | Stale NaN marker (3 bits) |
<varbit_xor2_nn>):Used after the 10 control prefix (dod=0, value known to have changed and be non-stale).
The delta=0 check is skipped, saving one bit on the reuse path:
| Prefix | Meaning |
|---|---|
0 | Reuse previous leading/trailing window; sigbits value bits follow |
1 + leading(5) + sigbits(6) + value(sigbits) | New leading/trailing window |
We use st_i_dod and st_i interchangeably when i>1 in these notes.
st_header is one byte:
┌───────────────────────┬───────────────────────┐
│ first_st_known<1 bit> | st_changed_on<7 bits> │
└───────────────────────┴───────────────────────┘
where the highest bit first_st_known indicates if st_0 is present or not.
If the lower 7bits st_changed_on is 0, no st_i (i>0) is present.
Otherwise st_i (i>=st_changed_on>) is present, while
st_i (0<i<st_changed_on) is not present.
Due to the 7 bit limitation, once a chunk has at least 127 samples,
st_changed_on is set to 127 (0xEF) and the 127th and further samples will
have st_i present.
st_0 is encoded as a varint if present.
st_1 is encoded as a varbit_ts delta from st_0 (or from 0 if st_0 is
not present).
st_i_dod aka st_i (i>1) is encoded as a varbit_ts "delta of delta" from
st_i-1 (or from 0 if st_i-1 is not present).
┌──────────────────────┬──────────────────────────┬───────────────────────────────┬─────────────────────┬──────────────────┬──────────────────┬──────────────────────┬────────────────┬──────────────────┐
│ num_samples <uint16> │ histogram_flags <1 byte> │ zero_threshold <1 or 9 bytes> │ schema <varbit_int> │ pos_spans <data> │ neg_spans <data> │ custom_values <data> │ samples <data> │ padding <x bits> │
└──────────────────────┴──────────────────────────┴───────────────────────────────┴─────────────────────┴──────────────────┴──────────────────┴──────────────────────┴────────────────┴──────────────────┘
┌─────────────────────────┬────────────────────────┬───────────────────────┬────────────────────────┬───────────────────────┬─────┬────────────────────────┬───────────────────────┐
│ num_spans <varbit_uint> │ length_0 <varbit_uint> │ offset_0 <varbit_int> │ length_1 <varbit_uint> │ offset_1 <varbit_int> │ ... │ length_n <varbit_uint> │ offset_n <varbit_int> │
└─────────────────────────┴────────────────────────┴───────────────────────┴────────────────────────┴───────────────────────┴─────┴────────────────────────┴───────────────────────┘
The custom_values data is currently only used for schema -53 (custom bucket boundaries). For other schemas, it is empty (length of zero).
┌──────────────────────────┬──────────────────┬──────────────────┬─────┬──────────────────┐
│ num_values <varbit_uint> │ value_0 <custom> │ value_1 <custom> │ ... │ value_n <custom> │
└──────────────────────────┴─────────────────────────────────────┴─────┴──────────────────┘
┌──────────────────────────┐
│ sample_0 <data> │
├──────────────────────────┤
│ sample_1 <data> │
├──────────────────────────┤
│ sample_2 <data> │
├──────────────────────────┤
│ ... │
├──────────────────────────┤
│ sample_n <data> │
└──────────────────────────┘
┌─────────────────┬─────────────────────┬──────────────────────────┬───────────────┬───────────────────────────┬─────┬───────────────────────────┬───────────────────────────┬─────┬───────────────────────────┐
│ ts <varbit_int> │ count <varbit_uint> │ zero_count <varbit_uint> │ sum <float64> │ pos_bucket_0 <varbit_int> │ ... │ pos_bucket_n <varbit_int> │ neg_bucket_0 <varbit_int> │ ... │ neg_bucket_n <varbit_int> │
└─────────────────┴─────────────────────┴──────────────────────────┴───────────────┴───────────────────────────┴─────┴───────────────────────────┴───────────────────────────┴─────┴───────────────────────────┘
┌───────────────────────┬──────────────────────────┬───────────────────────────────┬──────────────────────┬─────────────────────────────────┬─────┬─────────────────────────────────┬─────────────────────────────────┬─────┬─────────────────────────────────┐
│ ts_delta <varbit_int> │ count_delta <varbit_int> │ zero_count_delta <varbit_int> │ sum_xor <varbit_xor> │ pos_bucket_0_delta <varbit_int> │ ... │ pos_bucket_n_delta <varbit_int> │ neg_bucket_0_delta <varbit_int> │ ... │ neg_bucket_n_delta <varbit_int> │
└───────────────────────┴──────────────────────────┴───────────────────────────────┴──────────────────────┴─────────────────────────────────┴─────┴─────────────────────────────────┴─────────────────────────────────┴─────┴─────────────────────────────────┘
┌─────────────────────┬────────────────────────┬─────────────────────────────┬──────────────────────┬───────────────────────────────┬─────┬───────────────────────────────┬───────────────────────────────┬─────┬───────────────────────────────┐
│ ts_dod <varbit_int> │ count_dod <varbit_int> │ zero_count_dod <varbit_int> │ sum_xor <varbit_xor> │ pos_bucket_0_dod <varbit_int> │ ... │ pos_bucket_n_dod <varbit_int> │ neg_bucket_0_dod <varbit_int> │ ... │ neg_bucket_n_dod <varbit_int> │
└─────────────────────┴────────────────────────┴─────────────────────────────┴──────────────────────┴───────────────────────────────┴─────┴───────────────────────────────┴───────────────────────────────┴─────┴───────────────────────────────┘
histogram_flags is a byte of which currently only the first two bits are used:
10: Counter reset between the previous chunk and this one.01: No counter reset between the previous chunk and this one.00: Counter reset status unknown.11: Chunk is part of a gauge histogram, no counter resets are happening.zero_threshold has a specific encoding:
schema is a specific value defined by the exposition format. Currently
valid values are either -4 <= n <= 8 (standard exponential schemas) or -53
(custom bucket boundaries).<varbit_int> is a variable bitwidth encoding for signed integers, optimized for “delta of deltas” of bucket deltas. It has between 1 bit and 9 bytes.
See code for details.<varbit_uint> is a variable bitwidth encoding for unsigned integers with the same bit-bucketing as <varbit_int>.
See code for details.<varbit_xor> is a specific variable bitwidth encoding of the result of XORing the current and the previous value. It has between 1 bit and 77 bits.
See code for details.padding of 0 to 7 bits so that the whole chunk data is byte-aligned.bucket_0 is an absolute count.The <custom> encoding within the custom values data depends on the schema.
For schema -53 (custom bucket boundaries, currently the only use case for
custom values), the values to encode are bucket boundaries in the form of
floats. The encoding of a given float value x works as follows:
<varbit_uint>.<float64>.Note that values stored as per (2) will always start with a 1 bit, which allow decoders to recognize this case in contrast to values stores as per (3), which always start with a 0 bit.
The rational behind this encoding is that most custom bucket boundaries are set by humans as decimal numbers with not very many decimal places. In most cases, the encoding will therefore result in a short varbit representation. The upper bound of 33554430 is picked so that the varbit encoded value will take at most 4 bytes.
Float histograms have the same layout as histograms apart from the encoding of samples.
┌──────────────────────────┐
│ sample_0 <data> │
├──────────────────────────┤
│ sample_1 <data> │
├──────────────────────────┤
│ sample_2 <data> │
├──────────────────────────┤
│ ... │
├──────────────────────────┤
│ sample_n <data> │
└──────────────────────────┘
┌─────────────────┬─────────────────┬──────────────────────┬───────────────┬────────────────────────┬─────┬────────────────────────┬────────────────────────┬─────┬────────────────────────┐
│ ts <varbit_int> │ count <float64> │ zero_count <float64> │ sum <float64> │ pos_bucket_0 <float64> │ ... │ pos_bucket_n <float64> │ neg_bucket_0 <float64> │ ... │ neg_bucket_n <float64> │
└─────────────────┴─────────────────┴──────────────────────┴───────────────┴────────────────────────┴─────┴────────────────────────┴────────────────────────┴─────┴────────────────────────┘
┌───────────────────────┬────────────────────────┬─────────────────────────────┬──────────────────────┬───────────────────────────────┬─────┬───────────────────────────────┬───────────────────────────────┬─────┬───────────────────────────────┐
│ ts_delta <varbit_int> │ count_xor <varbit_xor> │ zero_count_xor <varbit_xor> │ sum_xor <varbit_xor> │ pos_bucket_0_xor <varbit_xor> │ ... │ pos_bucket_n_xor <varbit_xor> │ neg_bucket_0_xor <varbit_xor> │ ... │ neg_bucket_n_xor <varbit_xor> │
└───────────────────────┴────────────────────────┴─────────────────────────────┴──────────────────────┴───────────────────────────────┴─────┴───────────────────────────────┴───────────────────────────────┴─────┴───────────────────────────────┘
┌─────────────────────┬────────────────────────┬─────────────────────────────┬──────────────────────┬───────────────────────────────┬─────┬───────────────────────────────┬───────────────────────────────┬─────┬───────────────────────────────┐
│ ts_dod <varbit_int> │ count_xor <varbit_xor> │ zero_count_xor <varbit_xor> │ sum_xor <varbit_xor> │ pos_bucket_0_xor <varbit_xor> │ ... │ pos_bucket_n_xor <varbit_xor> │ neg_bucket_0_xor <varbit_xor> │ ... │ neg_bucket_n_xor <varbit_xor> │
└─────────────────────┴────────────────────────┴─────────────────────────────┴──────────────────────┴───────────────────────────────┴─────┴───────────────────────────────┴───────────────────────────────┴─────┴───────────────────────────────┘