docs/en/sql-reference/functions/numeric-indexed-vector-functions.md
NumericIndexedVector is an abstract data structure that encapsulates a vector and implements vector aggregating and pointwise operations. Bit-Sliced Index is its storage method. For theoretical basis and usage scenarios, refer to the paper Large-Scale Metric Computation in Online Controlled Experiment Platform.
In the BSI (Bit-Sliced Index) storage method, the data is stored in Bit-Sliced Index and then compressed using Roaring Bitmap. Aggregating operations and pointwise operations are directly on the compressed data, which can significantly improve the efficiency of storage and query.
A vector contains indices and their corresponding values. The following are some characteristics and constraints of this data structure in BSI storage mode:
UInt8, UInt16, or UInt32. Note: Considering the performance of 64-bit implementation of Roaring Bitmap, BSI format does not support UInt64/Int64.Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, or Float64. Note: The value type does not automatically expand. For example, if you use UInt8 as the value type, any sum that exceeds the capacity of UInt8 will result in an overflow rather than being promoted to a higher type; similarly, operations on integers will yield integer results (e.g., division will not automatically convert to a floating-point result). Therefore, it is important to plan and design the value type ahead of time. In real-world scenarios, floating-point types (Float32/Float64) are commonly used.-State.There are two ways to create this structure: one is to use the aggregate function groupNumericIndexedVector with -State.
You can add suffix -if to accept an additional condition.
The aggregate function will only process the rows that trigger the condition.
The other is to build it from a map using numericIndexedVectorBuild.
The groupNumericIndexedVectorState function allows customization of the number of integer and fractional bits through parameters, while numericIndexedVectorBuild does not.
Constructs a NumericIndexedVector from two data columns and returns the sum of all values as a Float64 type. If the suffix State is added, it returns a NumericIndexedVector object.
Syntax
groupNumericIndexedVectorState(col1, col2)
groupNumericIndexedVectorState(type, integer_bit_num, fraction_bit_num)(col1, col2)
Parameters
type: String, optional. Specifies the storage format. Currently, only 'BSI' is supported.integer_bit_num: UInt32, optional. Effective under the 'BSI' storage format, this parameter indicates the number of bits used for the integer part. When the index type is an integer type, the default value corresponds to the number of bits used to store the index. For example, if the index type is UInt16, the default integer_bit_num is 16. For Float32 and Float64 index types, the default value of integer_bit_num is 40, so the integer part of the data that can be represented is in the range [-2^39, 2^39 - 1]. The legal range is [0, 64].fraction_bit_num: UInt32, optional. Effective under the 'BSI' storage format, this parameter indicates the number of bits used for the fractional part. When the value type is an integer, the default value is 0; when the value type is Float32 or Float64 types, the default value is 24. The valid range is [0, 24].col1: The index column. Supported types: UInt8/UInt16/UInt32/Int8/Int16/Int32.col2: The value column. Supported types: Int8/Int16/Int32/Int64/UInt8/UInt16/UInt32/UInt64/Float32/Float64.Return value
A Float64 value representing the sum of all values.
Example
Test data:
UserID PlayTime
1 10
2 20
3 30
Query & Result:
SELECT groupNumericIndexedVector(UserID, PlayTime) AS num FROM t;
┌─num─┐
│ 60 │
└─────┘
SELECT groupNumericIndexedVectorState(UserID, PlayTime) as res, toTypeName(res), numericIndexedVectorAllValueSum(res) FROM t;
┌─res─┬─toTypeName(res)─────────────────────────────────────────────┬─numericIndexedVectorAllValueSum(res)──┐
│ │ AggregateFunction(groupNumericIndexedVector, UInt8, UInt8) │ 60 │
└─────┴─────────────────────────────────────────────────────────────┴───────────────────────────────────────┘
SELECT groupNumericIndexedVectorStateIf(UserID, PlayTime, day = '2025-04-22') as res, toTypeName(res), numericIndexedVectorAllValueSum(res) FROM t;
┌─res─┬─toTypeName(res)────────────────────────────────────────────┬─numericIndexedVectorAllValueSum(res)──┐
│ │ AggregateFunction(groupNumericIndexedVector, UInt8, UInt8) │ 30 │
└─────┴────────────────────────────────────────────────────────────┴───────────────────────────────────────┘
SELECT groupNumericIndexedVectorStateIf('BSI', 32, 0)(UserID, PlayTime, day = '2025-04-22') as res, toTypeName(res), numericIndexedVectorAllValueSum(res) FROM t;
┌─res─┬─toTypeName(res)──────────────────────────────────────────────────────────┬─numericIndexedVectorAllValueSum(res)──┐
│ │ AggregateFunction('BSI', 32, 0)(groupNumericIndexedVector, UInt8, UInt8) │ 30 │
└─────┴──────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────┘
:::note
The documentation below is generated from the system.functions system table.
:::