documentation/query/datatypes/array.md
QuestDB supports N-dimensional arrays, modeled after NumPy's NDArray type.
Arrays are useful for storing vectors, matrices, and higher-dimensional data
directly in your database.
:::important
Arrays currently only support DOUBLE elements. Other element types are not yet
available.
:::
| Operation | Syntax | Example |
|---|---|---|
| Create 1D array | ARRAY[v1, v2, ...] | ARRAY[1.0, 2.0, 3.0] |
| Create 2D array | ARRAY[[...], [...]] | ARRAY[[1, 2], [3, 4]] |
| Create 3D array | ARRAY[[[...]], [[...]]] | ARRAY[[[1, 2], [3, 4]], [[5, 6], [7, 8]]] |
| Build dynamically | array_build() | array_build(1, 3, 0) returns [0.0, 0.0, 0.0] |
| Access element | arr[i, j, ...] or arr[i][j]... | arr[1, 2, 3] returns DOUBLE |
| Access sub-array | arr[i] | arr[1] returns lower-dimensional array |
| Slice (range) | arr[low:high] | arr[1:3] keeps dimensionality |
| Slice (open-ended) | arr[low:] | arr[2:] to end of dimension |
| Mixed access | arr[i, low:high] | arr[1, 2:4] |
| Compare arrays | =, <> | arr1 = arr2 (same shape and elements) |
Array types are written as DOUBLE[dim1][dim2]..., for example:
DOUBLE[3] — 1D array with 3 elementsDOUBLE[2][3] — 2D array (2 rows, 3 columns)DOUBLE[2][3][4] — 3D arrayIndexing is 1-based (SQL standard). See array functions for operations like array_sum, array_avg, matmul, etc.
You can create an array from scalar values using the ARRAY[...] syntax:
CREATE TABLE tango AS (SELECT ARRAY[
[ [ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ],
[ [10, 11, 12], [13, 14, 15], [16, 17, 18] ],
[ [19, 20, 21], [22, 23, 24], [25, 26, 27] ]
] arr from long_sequence(1));
Values can be any expressions that yield scalars, so you can construct the array from existing column data.
Values can also be arrays, creating a higher-dimensional array:
CREATE TABLE tango AS (SELECT ARRAY[1, 2] arr, ARRAY[3, 4] brr FROM long_sequence(1));
SELECT ARRAY[arr, brr] FROM tango;
| array_2d |
|---|
| [[1.0,2.0],[3.0,4.0]] |
When the array size or contents need to vary per row, or when you want to fill
an array with a computed value, use
array_build():
-- variable-length array filled with -1
SELECT array_build(1, x::int, -1) FROM long_sequence(3);
-- array filled with the max of an existing array (market_data is on demo.questdb.com)
SELECT array_build(1, bids[1], array_max(bids[1])) FROM market_data LIMIT 1;
Array access syntax is modeled on Python's NDArray, except
that QuestDB uses 1-based indexing (as is standard in SQL):
arr[<dim1-selector>, <dim2-selector>, ...]
Each dimN-selector can be one of two forms:
low:highAll the following examples use the 3D array named arr, of type
DOUBLE[3][3][3]:
CREATE TABLE tango AS (SELECT ARRAY[
[ [ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ],
[ [10, 11, 12], [13, 14, 15], [16, 17, 18] ],
[ [19, 20, 21], [22, 23, 24], [25, 26, 27] ]
] arr from long_sequence(1));
Using single integers you select individual array elements. An element of a 2D
array is a 1D sub-array, and an element of a 1D array is an individual scalar
value, like a DOUBLE. If you use a coordinate larger than the array's given
dimension length, the result will be NULL for scalars, and an empty array for
sub-arrays.
SELECT arr[1, 3, 2] elem FROM tango;
| elem |
|---|
| 8.0 |
This selected the DOUBLE number at the coordinates (1, 3, 2). Remember that
the coordinates are 1-based!
:::tip
The syntax arr[1, 3, 2] is interchangeable with arr[1][3][2]. The
performance of both styles is the same.
:::
SELECT arr[1, 3, 4] elem FROM tango;
| elem |
|---|
| NULL |
SELECT arr[1] subarr FROM tango;
| subarr |
|---|
| [[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]] |
This selected the first 2D sub-array in arr.
SELECT arr[4] subarr FROM tango;
| subarr |
|---|
| [] |
SELECT arr[1, 3] subarr FROM tango;
| subarr |
|---|
| [7.0,8.0,9.0] |
This selected the first 2D-subarray in arr, and then the 3rd 1D-subarray in
it.
A range of integers selects a slice of the array. You can think of slicing as
leaving the array intact, but constraining the range of numbers you can use for
a coordinate. The lowest valid coordinate remains 1, but it gets remapped to
the coordinate indicated by the lower bound of the slicing range.
The dimensionality of the result remains the same, even if the range contains just one number. The slice includes the lower bound, but excludes the upper bound.
You can omit the upper bound, like this: arr[2:]. The slice will then extend
to the end of the array in the corresponding dimension. The lower bound is
mandatory, due to syntax conflict with variable placeholders such as :a or
:2.
If the upper bound of the range exceeds the array's length, the result is the same as if the upper bound was left out — the result extends to the end of the array along that dimension.
arr by constraining the first dimensionSELECT arr[2:3] slice FROM tango;
| slice |
|---|
| [[[10.0,11.0,12.0],[13.0,14.0,15.0],[16.0,17.0,18.0]]] |
This returned a DOUBLE[1][3][3], containing just the second sub-array of
arr.
arr with a right-open rangeSELECT arr[2:] slice FROM tango;
| slice |
|---|
| [[[10.0,11.0,12.0],[13.0,14.0,15.0],[16.0,17.0,18.0]], |
| [[19.0,20.0,21.0],[22.0,23.0,24.0],[25.0,26.0,27.0]]] |
This returned a DOUBLE[2][3][3] and contains everything except the first
sub-array along the first dimension.
arr by constraining the first and second dimensionsSELECT arr[2:3, 3:4] slice FROM tango;
| slice |
|---|
| [[[16.0,17.0,18.0]]] |
Note that the returned array is still 3D.
arr with large upper boundsSELECT arr[2:100, 3:100] slice FROM tango;
| slice |
|---|
| [[[16.0,17.0,18.0]],[[25.0,26.0,27.0]]] |
The result is the same as if using arr[2:, 3:].
You can use both types of selectors within the same bracket expression.
arr, and slice itSELECT arr[1, 2:4] subarr FROM tango;
| subarr |
|---|
| [[4.0,5.0,6.0],[7.0,8.0,9.0]] |
This returned a DOUBLE[2][3]. The top dimension is gone because the first
selector took out a sub-array and not a one-element slice.
SELECT arr[1:, 3, 2] subarr FROM tango;
| subarr |
|---|
| [8.0,17.0,26.0] |
This left the top dimension unconstrained, then took the 3rd sub-array in each of the top-level sub-arrays, and then selected just the 2nd element in each of them.
Arrays can be compared using the = and <> operators. Two arrays are equal if
and only if:
SELECT
ARRAY[1, 2, 3] = ARRAY[1, 2, 3] as same,
ARRAY[1, 2, 3] = ARRAY[1, 2, 4] as different_value,
ARRAY[1, 2, 3] = ARRAY[[1, 2, 3]] as different_shape;
| same | different_value | different_shape |
|---|---|---|
| true | false | false |
NULL elements are handled consistently with scalar DOUBLE behavior: two
NULL values are considered equal when comparing arrays.
SELECT ARRAY[1, NULL, 3] = ARRAY[1, NULL, 3] as with_nulls;
| with_nulls |
|---|
| true |
Functions that operate on arrays are documented on a dedicated page. There are also some financial functions that operate on arrays.
This section explains the internal design of N-dimensional arrays. Understanding these concepts can help you write more efficient queries.
The physical layout of the N-dimensional array is a single memory block with values arranged (by default) in row-major order, where the coordinates of the adjacent elements differ in the rightmost coordinate first (much like the adjacent numbers differ in the rightmost digit first: 41, 42, 43, etc.)
Separately, there are two lists of integers that describe this block of values, and give it its N-dimensional appearance: shape and strides. Both have length equal to the number of dimensions.
Unlike NumPy, QuestDB expresses strides as element jumps, not byte count jumps. Strides are used to perform a number of operations (such as transpose, slicing, etc) without the need to copy the array data itself.
Also unlike NumPy, QuestDB's DOUBLE[] allows NULL elements, with the same
semantics as plain DOUBLE values: all non-finite floating point values (NaN,
Infinity, -Infinity) are seen as NULL.
Here's a visual example of a 3-dimensional array of type DOUBLE[2][3][2]:
dim 1: |. . . . . .|. . . . . .| -- stride = 6, len = 2
dim 2: |. .|. .|. .|. .|. .|. .| -- stride = 2, len = 3
dim 3: |.|.|.|.|.|.|.|.|.|.|.|.| -- stride = 1, len = 2
The dots are the individual values (DOUBLE numbers in our case). Each row
shows the whole array, but with different subdivisions according to the
dimension. So, in dim 1, the row is divided into two slots, the sub-arrays at
coordinates 1 and 2 along that dimension, and the distance between the start of
slot 1 and slot 2 is 6 (the stride for that dimension). In dim 3, each slot
contains an individual number, and the stride is 1.
The legal values for the coordinate in dim 3 are just 1 and 2, even though you
would be able to access any array element just by using a large-enough number.
This is how the flat array gets its 3-dimensional appearance: for each value,
there's a unique list of coordinates, [i, j, k], that addresses it.
QuestDB can perform all of the following operations cheaply, by editing just the
two small lists, shape and strides, and doing nothing to the potentially
huge block of memory holding the array values:
Slice: extract a 3-dimensional array that is just a part of the full one,
by constraining the range of legal coordinates at each dimension. Example:
array[1:2, 2:4, 1:2] will give us a view into the array with the shape
DOUBLE[1, 2, 1], covering just the ranges of coordinates indicated in the
expression.
Take a sub-array: constrain the coordinate at a given dimension to just one
choice, and then eliminate that dimension from the array. Example: array[2]
has the shape DOUBLE[3, 2] and consists of the second subarray in the 1st
dimension.
Flatten: remove a dimension from the array, flattening it into the
next-finer dimension. Example: flattening dim 2 gives us an array shape
DOUBLE[2, 6]. All elements are still available, but using just 2
coordinates.
Transpose: reverse the shape and the strides, changing the meaning of each
coordinate. Example: transposing our array changes the strides from
(6, 2, 1) to (1, 2, 6). What we used to access with the 3rd coordinate,
we now access with the 1st coordinate. On a 2D array, this would have the
effect of swapping rows and columns (transposing a matrix).
QuestDB stores the shape along with the array. However, it has no need to store strides: they can be calculated from the shape. Strides become relevant once you perform one of the mentioned array shape transformations. We say that an array whose shape hasn't been transformed (that is, it matches the physical arrangement of elements) is a vanilla array, and this has consequences for performance.
A "vanilla" array is just a plain, contiguous N-D array in row-major order. A vanilla array can be processed by optimized bulk operations that go over the entire block of memory, disregarding the shape and strides, whereas for most other arrays we have to step through all the coordinates one by one and calculate the position of each element.
There's a special case where the array is transformed just so that the new array can be found in vanilla shape within the memory block of the original array (a contiguous sub-array). Such a transformed array is also considered vanilla. Major examples are slicing the array at the top (leftmost) dimension, and taking a sub-array at the top dimension.
While performing a shape transformation is cheap on its own, whole-array operations, such as equality checks, adding/multiplying two arrays, etc., are expected to be faster on vanilla arrays.
QuestDB always stores arrays in vanilla form. If you transform an array's shape and then store it to the database, QuestDB will physically rearrange the elements, and store the new array in vanilla shape.