Back to Daft

SQL Data Types

docs/sql/datatypes.md

0.7.107.5 KB
Original Source

SQL Data Types

These tables define how Daft's DataType maps to the common SQL types and aliases.

!!! note "Note"

In these tables, the **Type** column identifies the Daft [DataType](../api/datatypes/all_datatypes.md), whereas the **Name** and **Aliases** columns represent supported SQL syntax.

Boolean Type

TypeNameAliasesDescription
[bool][daft.datatype.DataType.bool]BOOLEANBOOLTRUE or FALSE

Numeric Types

TypeNameAliasesDescription
[int8][daft.datatype.DataType.int8]TINYINTINT18-bit signed integer
[int16][daft.datatype.DataType.int16]SMALLINTINT2, INT1616-bit signed integer
[int32][daft.datatype.DataType.int32]INT, INTEGERINT4, INT3232-bit signed integer
[int64][daft.datatype.DataType.int64]BIGINTINT8, INT6464-bit signed integer
[uint8][daft.datatype.DataType.uint8]TINYINT UNSIGNEDUINT18-bit unsigned integer
[uint16][daft.datatype.DataType.uint16]SMALLINT UNSIGNEDUINT2, UINT1616-bit unsigned integer
[uint32][daft.datatype.DataType.uint32]INT UNSIGNEDUINT4, UINT3232-bit unsigned integer
[uint64][daft.datatype.DataType.uint64]BIGINT UNSIGNEDUINT8, UINT6464-bit unsigned integer
[float32][daft.datatype.DataType.float32]REALFLOAT(P), FLOAT3232-bit floating point
[float64][daft.datatype.DataType.float64]DOUBLE [PRECISION]FLOAT(P), FLOAT64 , FLOAT64-bit floating point
[decimal128][daft.datatype.DataType.decimal128]DEC(P,S), DECIMAL(P,S)NUMERIC(P,S)Fixed-point number

Text Types

TypeNameAliasesDescription
[string][daft.datatype.DataType.string]VARCHARSTRING, TEXTVariable-length string (see warning)

!!! warning "Warning"

Daft uses UTF-8, and there is no fixed-length character string type. You may use the fixed-sized binary type to specify a fixed size in bytes.

Binary Types

TypeNameAliasesDescription
[binary][daft.datatype.DataType.binary]BINARYBYTESVariable-length byte string
[fixed_size_binary][daft.datatype.DataType.fixed_size_binary]BINARY(N)BYTES(N)Fixed-length byte string

!!! warning "Warning"

SQL defines `BINARY` and `BINARY VARYING` for fixed-length and variable-length binary strings respectively. Like PostgreSQL, Daft does not use the SQL "up-to length" semantic, and instead opts for variable length if none is given. These aliases are [subject to change](https://github.com/Eventual-Inc/Daft/issues/3955).

Datetime Types

TypeNamesAliasesDescription
[date][daft.datatype.DataType.date]DATEDate without Time
[time][daft.datatype.DataType.time]TIMETime without Date
[timestamp][daft.datatype.DataType.timestamp]TIMESTAMPDate with Time
[interval][daft.datatype.DataType.interval]INTERVALTime Interval (YD & DT)
[duration][daft.datatype.DataType.duration]-Time Duration

!!! warning "Warning"

Daft does not currently support `TIME WITH TIMEZONE` and `TIMESTAMP WITH TIMEZONE`, please see [GitHub #3957](https://github.com/Eventual-Inc/Daft/issues/3957) and feel free to bump this issue if you need it prioritized.

Array Types

TypeNameAliasesDescription
[list][daft.datatype.DataType.list]T ARRAY[]T[]Variable-length list of elements
[fixed_size_list][daft.datatype.DataType.fixed_size_list]T ARRAY[N]T[N]Fixed-length list of elements

Map & Struct Types

TypeSyntaxExampleDescription
[map][daft.datatype.DataType.map]MAP(K, V)MAP(INT, BOOL)Key-value pairs with the same types
[struct][daft.datatype.DataType.struct]STRUCT(FIELDS...)STRUCT(A INT, B BOOL)Named values of varying types

Complex Types

TypeSyntaxExampleDescription
[tensor][daft.datatype.DataType.tensor]TENSOR(T, [shape...])TENSOR(INT)Multi-dimensional array
[image][daft.datatype.DataType.image]IMAGE(mode, [dimensions])IMAGE('RGB', 256, 256)Image data array
[embedding][daft.datatype.DataType.embedding]EMBEDDING(T, N)EMBEDDING(INT, 100)Fixed-length numeric array

Tensor Type

The TENSOR(T, [shape...]) type represents n-dimensional arrays of data of the provided type T, each of the provided shape. The shape is given as an optional list of integers.

Examples

sql
TENSOR(INT)
TENSOR(INT, 10, 10, 10)

Image Type

The IMAGE(mode, [dimensions]) type represents an array of pixels with designated pixel type. The supported modes are covered in [ImageMode][daft.daft.ImageMode]. If the height, width, and mode are the same for all images in the array, specifying them when constructing this type is advised, since that will allow Daft to create a more optimized physical representation of the image array.

Examples

sql
IMAGE('RGB')
IMAGE('RGB', 256, 256)

Embedding Type

The EMBEDDING(T, N) type represents a fixed-length N array of numeric type T.

Examples

sql
EMBEDDING(INT16, 256)
EMBEDDING(INT32, 100)