Back to Lance

BTree Index

docs/src/format/table/index/scalar/btree.md

4.0.13.0 KB
Original Source

BTree Index

The BTree index is a two-level structure that provides efficient range queries and sorted access. It strikes a balance between an expensive memory structure containing all values and an expensive disk structure that can't be efficiently searched.

The upper layers of the BTree are designed to be cached in memory and stored in a BTree structure (page_lookup.lance), while the leaves are searched using sub-indices (page_data.lance, currently just a flat file). This design enables efficient memory usage - for example, with 1 billion values, the index can store 256K leaves of size 4K each, requiring only a few MiB of memory (depending on data type) for the BTree metadata while narrowing any search to just 4K values.

Index Details

protobuf
%%% proto.message.BTreeIndexDetails %%%

Storage Layout

The BTree index consists of two files:

  1. page_lookup.lance - The BTree structure mapping value ranges to page numbers
  2. page_data.lance - The actual sub-indices (flat file) containing sorted values and row IDs

Page Lookup File Schema (BTree Structure)

ColumnTypeNullableDescription
min{DataType}trueMinimum value in the page (forms BTree keys)
max{DataType}trueMaximum value in the page (for range pruning)
null_countUInt32falseNumber of null values in the page
page_idxUInt32falsePage number pointing to the sub-index in page_data.lance

Schema Metadata

KeyTypeDescription
batch_sizeStringNumber of rows per page (default: "4096")

Page Data File Schema (Sub-indices)

ColumnTypeNullableDescription
values{DataType}trueSorted values from the indexed column (flat file)
idsUInt64falseRow IDs corresponding to each value

Accelerated Queries

The BTree index provides exact results for the following query types:

Query TypeDescriptionOperation
Equalscolumn = valueBTree lookup to find relevant pages, then search within sub-indices
Rangecolumn BETWEEN a AND bBTree traversal for pages overlapping the range, then search each sub-index
IsIncolumn IN (v1, v2, ...)Multiple BTree lookups, union results from all matching sub-indices
IsNullcolumn IS NULLReturns rows from all pages where null_count > 0