docs/architecture/sql-data.md
The SQL data model represents user data in tables and rows. It is made up of data types and schemas,
in the sql::types
module.
toyDB supports four basic scalar data types as sql::types::DataType: booleans, integers, floats,
and strings.
Specific values are represented as sql::types::Value, using the corresponding Rust types. toyDB
also supports SQL NULL values, i.e. unknown values, following the rules of
three-valued logic.
The Value type provides basic formatting, conversion, and mathematical operations.
It also specifies comparison and ordering semantics, but these are subtly different from the SQL
semantics. For example, in Rust code Value::Null == Value::Null yields true, while in SQL
NULL = NULL yields NULL. This mismatch is necessary for the Rust code to properly detect and
process Null values, and the desired SQL semantics are implemented during expression evaluation
which we'll cover below.
During execution, a row of values is represented as sql::types::Row, with multiple rows emitted
via sql::types::Rows row iterators:
toyDB schemas only support tables. There are no named indexes or constraints, and there's only a single unnamed database.
Tables are represented by sql::types::Table:
A table is made up of a set of columns, represented by sql::types::Column. These support the data
types described above, along with unique constraints, foreign keys, and secondary indexes.
The table name serves as a unique identifier, and can't be changed later. In fact, tables schemas are entirely static: they can only be created or dropped (there are no schema changes).
Table schemas are stored in the catalog, represented by the sql::engine::Catalog trait. We'll
revisit the implementation of this trait in the SQL storage section.
Table schemas are validated when created via Table::validate(), which enforces invariants and
internal consistency. It uses the catalog to look up information about other tables, e.g. that
foreign key references point to a valid target column in a different table.
Table rows are validated via Table::validate_row(), which ensures that a sql::types::Row
conforms to the schema (e.g. that value types match the column data types). It uses a
sql::engine::Transaction to look up other rows in the database, e.g. to check for primary key
conflicts (we'll get back to this later).
During SQL execution, we also have to model expressions, such as 1 + 2 * 3. These are
represented as values and operations on them, and can be nested as a tree to represent compound
operations.
For example, the expression 1 + 2 * 3 (taking precedence
into account) is represented as:
// +
// / \
// 1 *
// / \
// 2 3
Expression::Add(
Expression::Constant(Value::Integer(1)),
Expression::Multiply(
Expression::Constant(Value::Integer(2)),
Expression::Constant(Value::Integer(3)),
),
)
An Expression can contain two kinds of values: constant values as
Expression::Constant(sql::types::Value), and dynamic values as Expression::Column(usize) column
references. The latter will fetch a sql::types::Value from a sql::types::Row at the specified
index during evaluation.
We'll see later how the SQL parser and planner transforms text expression like 1 + 2 * 3 into an
Expression, and how it resolves column names to row indexes like price * 0.25 to
row[3] * 0.25.
Expressions are evaluated recursively via Expression::evalute(), given a sql::types::Row with
input values for column references, and return a final sql::types::Value result:
Many of the comparison operations like == are implemented explicitly here instead of using
sql::types::Value comparisons. This is where we implement the SQL semantics of special values like
NULL, such that NULL = NULL yields NULL instead of TRUE.
For mathematical operations however, we generally dispatch to these methods on sql::types::Value:
Expression parsing and evaluation is tested via test scripts in
sql/testscripts/expression.