Back to Cockroach

Summary

docs/RFCS/20171201_cascade.md

26.1.358.9 KB
Original Source
  • Feature Name: Cascading Referential Integrity Constraints
  • Status: in-progress
  • Start Date: 2017-26-17
  • Authors: Bram Gruneir
  • RFC PR: #19570
  • Cockroach Issue: #14848, #17803, #20305

Summary

Being able to cascade deletes through foreign key references is a very common activity and part of the SQL standard. This RFC proposes to add in the ON DELETE action and ON UPDATE action keywords when defining the foreign key references on a CREATE TABLE or ALTER TABLE command.

This RFC proposes the addition of the following optional actions to referential integrity constraints within a CREATE TABLE's REFERENCES and FOREIGN KEY columns:

  • ON DELETE CASCADE
  • ON UPDATE CASCADE
  • ON DELETE SET DEFAULT
  • ON UPDATE SET DEFAULT
  • ON DELETE SET NULL
  • ON UPDATE SET NULL

While not essential, these actions, specifically ON DELETE CASCADE, have been requested multiple times by both the community and potential customers.

The addition of these optional actions will unlock more ORM functionality as cascade is a very common option when defining foreign key references and bring us closer to parity with Postgres.

Not Covered

Note that this RFC will only be tangentially discussing deferring or disabling the enforcement of foreign key constraints. So what is not covered includes all of the deferrable options on constraints: DEFERRED NOT DEFERRABLE, DEFERRED INITIALLY IMMEDIATE, DEFERRED INITIALLY DEFERRED. This rfc also excludes setting the transaction to defer certain constraints and not using the commands SET CONSTRAINTS ALL/some_named_constraint DEFERRED or SET CONSTRAINTS ALL/some_named_constraint IMMEDIATE.

Futhermore, this RFC will not be covering different techniques for matching composite foreign keys using the MATCH keyword. However, there will be some discussion around these concepts as they will have an impact in the future if we choose to implement them.

At this time there is no plan to allow multiple tables to be referenced by the same key while this is allowed in Postgres. An example of how this works is shared in the reference-level explanation.

Motivation

While cascading referential integrity constraints are in the SQL standard, they are not essential as they can be seen as shortcuts to writing longer SQL transactions. However, they have been an often requested feature and are quite often used in ORMs. What is clear, is that the addition of these options will unlock new customer opportunities for us.

The use of ON DELETE/UPDATE CASCADE in combination with interleaved tables is a great fit and can be used as a great introduction to interleaving.

Guide-level explanation

What is a table constraint?

A table constraint is a check on a table that ensures that all rows in that table must pass before being committed. These can include simple checks, such as ensuring a value is greater than zero, to more complex ones such as ensuring the value in a column is unique. See here for our docs on the subject. The constraints we're specifically interested in for this RFC are foreign keys references.

What is a referential integrity constraint?

A referential integrity constraint is the technical term for a foreign key constraint. They require that the key in a column is present in another table. Here are our docs about them.

How do cascading referential integrity constraints work?

When updating or deleting a row in a table that has been used as a foreign key in another table, it is also important to update the table that's referencing this key. When declaring a foreign key constraint, it is possible to specify what action should occur when the referenced key is altered or deleted.

What are ON DELETE and ON UPDATE actions?

There are optional actions that will occur when a referenced foreign key is removed (ON DELETE) or when a key is updated (ON UPDATE). All foreign key relationships already have default actions for both of these, and that is NO ACTION.

What are all the available actions?

NO ACTION

If the foreign key being removed is still referenced in another table, fail the transaction. NO ACTION is poorly named and should be read as the answer to: "What action should be taken when the reference is changed?". Take no action and fail the transaction.

RESTRICT

RESTRICT is exactly the same as NO ACTION in our current implementation. When used in Postgres, it always immediately checks and fails if there is any row referencing the foreign key that was updated or deleted. Unlike NO ACTION which waits until the end of the transaction if constraint checks have been deferred. This action has been added and stored in the ForeignKeyReference for future use for when we opt to add the ability to defer constraint checking.

CASCADE

This is the most common use case. When the CASCADE action is specified, if the referenced foreign key is deleted or altered, then respectively delete the row or alter the key in the referencing table. The name cascade comes from the fact that the change made on the referenced table cascades into the referencing table. It is not uncommon for cascading deletes to walk through numerous tables.

SET NULL

When the referencing key is deleted or altered this option sets the foreign key reference column to null. Note that this will not work on columns that also have the NOT NULL constraint. The creation of the constraint should fail for a column that has both NOT NULL and a SET NULL action.

SET DEFAULT

When the referencing key is deleted or altered this option sets the foreign key referencing column to the default value for the column. If the default for the column is null, then this is the equivalent of SET NULL. Note that if this default is not null, then there must still be a referenced key that the default must match or the transaction will fail.

Why would one want to defer the checking of constraints?

Deferring constraints delays the checking and enforcement of the constraints until the end of a transaction instead of immediately failing when violated. The most commonly cited use case is cyclical foreign keys references but there are a number of instances where this can be very helpful to defer constraint checking.

Here is an excellent source that explains the different ways that Postgres handles deferrable constraints, the most common use cases for deferring and the pros and cons of doing so.

While we are not going to be deferring constraints at this time, we need be aware that this will be something that we most likely will be adding in the near future.

What is a composite foreign key?

A composite foreign key is a referential integrity constraint that is created from more than one source column on another table. We do support composite foreign keys and all added operations will need to take foreign key matching into account.

What are the different matching options for composite foreign keys?

  • MATCH FULL considers a foreign key as valid if every foreign key column is null or if all foreign key columns are not null and match the referenced table.
  • MATCH PARTIAL is valid if at least one Foreign key column is null and the rest of the not null columns match those of the referenced tables or all foreign key columns are not null and match the referenced table.
  • MATCH SIMPLE considers a foreign key valid if all columns are not null and match the referenced key exactly or if one of the foreign key values is null.

We currently only offer MATCH FULL, while Postgres' and SQL Server's default is MATCH SIMPLE. There are no plans to support the other match types at this time. However, the addition of the other composite key matching types will affect when to trigger an ON DELETE actions.

Reference-level explanation

Survey of other databases

All major databases support the ON DELETE action and ON UPDATE action options. But there are some minor differences in which options they allow. This section also includes the differences for the MATCH options and DEFERRED/IMMEDIATE constraint checking as they may be useful in the future.

Postgres

Postgres is by far the most nuanced database allowing more options than the others. Postgres supports 5 different actions:

  • CASCADE: delete or update the reference when the referenced key is removed
  • SET NULL: set the foreign key to null if the key is deleted or updated.
  • SET DEFAULT: set the foreign key to its default value if the key is deleted or updated.
  • RESTRICT: immediately fail if there are any existing references to the key being updated or deleted.
  • NO ACTION: The default option. Fail if there are any existing references to the key being updated or deleted, but allow this check to be deferred until the end of the transaction.

Postgres also offers two options around matching:

  • MATCH FULL: No part of a foreign key can be null, unless they are all null.
  • MATCH SIMPLE: The default behaviour. Nulls are allowed.

Postgres also offers deferred constraint checking using the DEFERRED and IMMEDIATE keywords.

Here is the main reference for Postgres.

Transact-SQL (MS SQL Server)

MS SQL Server offers a similar set of options to Postgres, but does not offer the RESTRICT action. However, since they do not offer a DEFERRED at all, NO ACTION is the equivalent of RESTRICT. To get around this restrictive mode, the suggested work around is to allow the columns to be null.

Futhermore, the cascading of the referential integrity constraints is also limited to a single instance of each table and there can be no loops. See this article for more info.

MS SQL Server has full support for all types composite key matching.

Here is main reference for SQL Server.

MySQL

MySQL actually offers all the actions that postgres does, but NO ACTION is an alias for RESTRICT. There is no option to defer the checking of the constraints and the DEFERRED and IMMEDIATE keywords are not valid. If one wants to make adjustments and work around the constraints, the system variable foreign_key_checks can be set to 0 which turns off constraint checking within a session. But it is important to note that turning it back on by setting it to 1 does not then go back and check the constraints of the altered rows.

For composite keys, there is only a single matching option which is the equivalent of Postgres' MATCH SIMPLE. Including the MATCH keyword is accepted but based on the documentation broken and will potentially nullify any cascading action.

Here is the main reference for MySQL.

PL-SQL (Oracle)

Oracle only allows 3 options for cascading. The default value, which is specifically not specifying any action, is the equivalent to NO ACTION. It also has SET NULL and CASCADE. Oracle has a full suite of options for deferring constraint checking, including the keywords DEFERRED and IMMEDIATE. It also has the option to permanently disable a constraint using the DISABLE keyword. More details on disabling can be found here.

Oracle offer three levels of composite key matching:

  • MATCH FULL Partially null foreign keys are not permitted. Either all components of the foreign key must be null, or the combination of values contained in the foreign key must appear as the primary or unique key value of a single row of the referenced table.
  • MATCH PARTIAL Partially null composite foreign keys are permitted. Either all components of the foreign key must be null, or the combination of non-null values contained in the foreign key must appear in the corresponding portion of the primary or unique key value of a single row in the referenced table.
  • MATCH NONE Partially null composite foreign keys are permitted. If any column of a composite foreign key is null, then the non-null portions of the key do not have to match any corresponding portion of a parent key.

Here is the main reference for Oracle.

Google Spanner

Spanner offers ON DELETE CASCADE only for interleaved tables. They actually suggest using it just about every time. This is a great example and a perfect use case that also help explain how interleaved tables function. We should follow their lead here and suggest the same thing.

See here for their documentation on it. This is very good documentation on interleaved tables and we should produce a similarly robust page.

Foreign Keys References from Multiple Tables (unsupported)

We don't currently support foreign key references from multiple tables, but it would be possible to add this at some point in the future. Postgres does support them.

To further elucidate the point, instead of describing a foreign key reference from one table to another, it is possible to add a reference to the same keys in multiple tables. To see how this works, consider the following:

Table A has an id column. Table B contains a foreign key reference to table_a.id. Let's call it table_b.a_id. Table C contains a foreign key reference to table_a.id. Let's also call it table_c.a_id. Table D contains a foreign key reference to both table_b.a_id and table_c.a_id.

This means that to enter a value into table d, it must exist in both tables b and c. If it is only one, the constraint will be violated.

It should be noted, that this is the equivalent of having multiple foreign key references and an extra constraint that requires that the values of those referenced values are equal. However this will produce a column for each reference instead of a single column representing this complex relationship.

Detailed design

This RFC proposes to add in all the missing functionality and keywords from Postgres to allow for cascading referential integrity constraints.

Proposed Keywords

Here is a detailed list of all of the proposed foreign key reference actions:

  • ON DELETE CASCADE When a referenced foreign key is deleted, all rows referencing that key are deleted. If there are other alterations to the row, such as a SET NULL or SET DEFAULT, the delete with take precedence.

  • ON UPDATE CASCADE When a referenced foreign key is updated, update the columns of all rows referencing that key to the new value.

  • ON DELETE SET DEFAULT When a referenced foreign key is deleted, set the columns of all rows referencing that key to the default value for that column. If the default value for the column is null, this will have the same effect as ON DELETE SET NULL. The default value must still conform with all other constraints, such as UNIQUE.

  • ON UPDATE SET DEFAULT When a referenced foreign key is updated, set the columns of all rows referencing that key to the default value for that column. If the default value for the column is null, this will have the same effect as ON UPDATE SET NULL. The default value must still conform with all other constraints, such as UNIQUE.

  • ON DELETE SET NULL When a referenced foreign key is deleted, set the columns of all rows referencing that key to null. The column must allow nulls or this update will still fail.

  • ON UPDATE SET NULL When a referenced foreign key is updated, set the columns of all rows referencing that key to null. The column must allow nulls or this update will still fail.

Please note that all foreign key references can have both an ON DELETE and an ON UPDATE and that their actions are not required to be the same.

Examples

For all examples, the error codes generated are a combination of both Cockroach and Postgres errors. They will be refined during implementation.

Basic examples

For all of the following examples, assume we have the following setup:

SQL
CREATE TABLE a (
  id INT PRIMARY KEY
);

INSERT INTO a (id) VALUES (1), (2), (3), (4), (5), (6), (7), (8), (100);

CREATE TABLE b (
  delete_restrict INT NOT NULL REFERENCES a ON DELETE RESTRICT
 ,update_restrict INT NOT NULL REFERENCES a ON UPDATE RESTRICT
 ,delete_cascade INT NOT NULL REFERENCES a ON DELETE CASCADE
 ,update_cascade INT NOT NULL REFERENCES a ON UPDATE CASCADE
 ,delete_null INT REFERENCES a ON DELETE SET NULL
 ,update_null INT REFERENCES a ON UPDATE SET NULL
 ,delete_default INT DEFAULT 100 REFERENCES a ON DELETE SET DEFAULT
 ,update_default INT DEFAULT 100 REFERENCES a ON DELETE SET DEFAULT
);

INSERT INTO b VALUES (1, 2, 3, 4, 5, 6, 7, 8);

And we expect:

SQL
SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
|               1 |               2 |              3 |              4 |           5 |           6 |              7 |              8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON DELETE NO ACTION or ON DELETE RESTRICT

When we try to delete a referenced key that does not specify a delete action or specifies it to RESTRICT, the operation fails.

SQL
DELETE FROM a WHERE id = 1;
pq: foreign key violation: values [1] in columns [id] referenced in table "b"
ON UPDATE NO ACTION or ON UPDATE RESTRICT

When we try to update a referenced key that does not specify an update action or specifies it to RESTRICT, the operation fails.

SQL
UPDATE a SET id = 7 WHERE id = 2;
pq: foreign key violation: values [1] in columns [id] referenced in table "b"
ON DELETE CASCADE

When the reference key is deleted, we remove a referenced key and the referencing row is deleted.

SQL
DELETE FROM a WHERE id = 3;

SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(0 rows)
ON UPDATE CASCADE

When the referenced key is updated, the referencing key is updated to the same value.

SQL
UPDATE a SET id = 100 WHERE id = 4;

SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
|               1 |               2 |              3 |            100 |           5 |           6 |              7 |              8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON DELETE SET NULL

When the referenced key is deleted, the referencing key is set to null.

SQL
DELETE FROM a WHERE id = 5;

SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
|               1 |               2 |              3 |              4 | NULL        |           6 |              7 |              8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON UPDATE SET NULL

When the referenced key is updated, the referencing key is set to null.

SQL
UPDATE a SET id = 100 WHERE id = 6;

SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
|               1 |               2 |              3 |              4 |           5 | NULL        |              7 |              8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON DELETE SET DEFAULT

When the referenced key is deleted, the referencing key is set to the default value for that column.

SQL
DELETE FROM a WHERE id = 7;

SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
|               1 |               2 |              3 |              4 |           6 |           6 |            100 |              8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON UPDATE SET DEFAULT

When the referenced key is updated, the referencing key is set to the default value for that column.

SQL
UPDATE a SET id = 100 WHERE id = 8;

SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
|               1 |               2 |              3 |              4 |           5 |           6 |              7 |            100 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
Edge Case: ON UPDATE/DELETE SET DEFAULT default does not exist in referenced table

When the referenced key is updated or deleted and the referencing key's action is SET DEFAULT, the referencing key's value should bet set to its default value. However, there is no guarantee that that default value exists in the referenced table. When it does not the operation should fail.

SQL
DELETE FROM a WHERE id = 0;

DELETE FROM a where id = 7;
pq: foreign key violation: Key (delete_default)=(0) is not present in table "a";

UPDATE a SET id = 100 WHERE id = 8;
pq: foreign key violation: Key (delete_default)=(0) is not present in table "a";
Edge Case: ON UPDATE/DELETE SET NULL on a NOT NULL column

We should not allow a SET NULL action on a column that does not allow nulls. We do not currently allow the altering of constraints so this can be checked at the time of the creation of the constraint.

SQL
CREATE TABLE c (
  delete_not_nullable INT NOT NULL REFERENCES a ON DELETE SET NULL
);
pg: cannot create a constraint for column "delete_not_nullable" to ON DELETE SET NULL that also has a NOT NULL constraint.

CREATE TABLE c (
  update_not_nullable INT NOT NULL REFERENCES a ON UPDATE SET NULL
);
pg: cannot create a constraint for column "update_not_nullable" to ON UPDATE SET NULL that also has a NOT NULL constraint.
Edge Case: ON UPDATE/DELETE SET DEFAULT with no default value

Similarly, we should not allow a SET DEFAULT action on a column that does not have a default value. Again, we do not currently allow the altering of constraints so this can be checked at the time of the creation of the constraint.

SQL
CREATE TABLE c (
  delete_no_default INT REFERENCES a ON DELETE SET DEFAULT
);
pg: cannot create a constraint for column "delete_no_default" to ON DELETE SET DEFAULT that has no default value.

CREATE TABLE c (
  update_no_default INT REFERENCES a ON update SET DEFAULT
);
pg: cannot create a constraint for column "update_no_default" to ON UPDATE SET DEFAULT that has no default value.
Edge Case: ON UPDATE CASCADE violates another constraint

In this example, the second constraint on table c, the check that update_check is less than 100, means that it is possible to bump up against that new constraint when cascading an update from table a. In essence, by adding a constraint on the referencing column in table c that has an ON UPDATE CASCADE action, it is also adding that same constraint on table a, but only for any keys of a that exist in table c.

SQL
CREATE TABLE c (
  update_check INT REFERENCES a ON UPDATE CASCADE
 ,CONSTRAINT update_check CHECK (update_check < 100)
);
INSERT INTO c VALUES (1);
SELECT * FROM c;
+--------------+
| update_check |
+--------------+
|            1 |
+--------------+
(1 row)

UPDATE a SET id = 100 WHERE id = 1;
pg:  update or delete on table "a" violates foreign key constraint "b_delete_restrict_fkey" on table "b"
detail:  Key (id)=(1) is still referenced from table "b".
Edge Case: ON UPDATE/DELETE SET NULL violates another constraint

In a similar case, the ON UPDATE SET NULL or ON DELETE SET NULL can block updates to original table.

SQL
CREATE TABLE c (
  update_unique INT UNIQUE REFERENCES a ON UPDATE SET NULL
 ,delete_unique INT UNIQUE REFERENCES a ON DELETE SET NULL
);
INSERT INTO c VALUES (1,1), (null, null);
SELECT * FROM c;
+---------------+---------------+
| update_unique | delete_unique +
+---------------+---------------+
|             1 |             1 |
+---------------+---------------+
| NULL          | NULL          |
+---------------+---------------+
(2 rows)

UPDATE a SET id = 100 WHERE id = 1;
pg:  update or delete on table "a" violates foreign key constraint "b_delete_restrict_fkey" on table "b"
detail:  Key (id)=(1) is still referenced from table "b".

DELETE FROM a WHERE id = 1;
pg:  update or delete on table "a" violates foreign key constraint "c_update_unique_fkey" on table "c"
detail:  Key (id)=(1) is still referenced from table "c".

Cascading Examples

Here are the two canonical examples of cascading referential integrity constraints.

Cascading ON DELETE CASCADE

When a row in table a is deleted, all rows referencing that row in table b are deleted. Then all rows referencing the deleted row in table b are deleted.

SQL
CREATE TABLE a (
  id INT PRIMARY KEY
);
INSERT INTO a VALUES (1);

CREATE TABLE b (
  id INT PRIMARY KEY
 ,a_id INT REFERENCES a ON DELETE CASCADE
);
INSERT INTO b VALUES (1,1);

CREATE TABLE c (
  b_id INT REFERENCES b ON DELETE CASCADE
);
INSERT INTO c VALUES (1);

SELECT * FROM a;
+----+
| id |
+----+
|  1 |
+----+
(1 row)

SELECT * FROM b;
+----+------+
| id | a_id |
+----+------+
|  1 |    1 |
+----+------+
(1 row)

SELECT * FROM c;
+------+
| b_id |
+------+
|    1 |
+------+
(1 row)

DELETE FROM a WHERE id = 1;

SELECT * FROM a;
+----+
| id |
+----+
+----+
(0 rows)

SELECT * FROM b;
+----+------+
| id | a_id |
+----+------+
+----+------+
(0 rows)

SELECT * FROM c;
+--------+
| b_a_id |
+--------+
+--------+
(0 rows)
Cascading ON DELETE CASCADE to a RESTRICT

When a row in table a is deleted, all rows referencing that row in table b are deleted. Then all rows referencing the deleted row in table b can fail due to the deeper RESTRICT action.

SQL
CREATE TABLE a (
  id INT PRIMARY KEY
);
INSERT INTO a VALUES (1);

CREATE TABLE b (
  id INT PRIMARY KEY
 ,a_id INT REFERENCES a ON DELETE CASCADE
);
INSERT INTO b VALUES (1,1);

CREATE TABLE c (
  b_id INT REFERENCES b ON DELETE RESTRICT
);
INSERT INTO c VALUES (1);

DELETE FROM a WHERE id = 1;
pq: foreign key violation: values [1] in columns [id] referenced in table "c"
Cascading ON UPDATE CASCADE

When the key is updated in table a, its new value is cascaded to table b, which in turn gets cascaded to table c.

SQL
CREATE TABLE a (
  id INT PRIMARY KEY
);
INSERT INTO a VALUES (1);

CREATE TABLE b (
 a_id INT PRIMARY KEY REFERENCES a ON UPDATE CASCADE
);
INSERT INTO b VALUES (1);

CREATE TABLE c (
  b_a_id INT REFERENCES b ON UPDATE CASCADE
);
INSERT INTO c VALUES (1);

SELECT * FROM a
  INNER JOIN b ON a.id = b.a_id
  INNER JOIN c ON b.a_id = c.b_a_id;
+----+------+--------+
| id | a_id | b_a_id |
+----+------+--------+
|  1 |    1 |      1 |
+----+------+--------+
(1 row)

UPDATE a SET id = 2 WHERE id = 1;

SELECT * FROM a
  INNER JOIN b ON a.id = b.a_id
  INNER JOIN c ON b.a_id = c.b_a_id;
+----+------+--------+
| id | a_id | b_a_id |
+----+------+--------+
|  2 |    2 |      2 |
+----+------+--------+
(1 row)
Cascading ON UPDATE CASCADE to a RESTRICT

When the key is updated in table a, its new value is cascaded to table b, which in turn fails due to the RESTRICT action on table c.

SQL
CREATE TABLE a (
  id INT PRIMARY KEY
);
INSERT INTO a VALUES (1);

CREATE TABLE b (
 a_id INT PRIMARY KEY REFERENCES a ON UPDATE CASCADE
);
INSERT INTO b VALUES (1);

CREATE TABLE c (
  b_a_id INT REFERENCES b ON UPDATE RESTRICT
);
INSERT INTO c VALUES (1);

UPDATE a SET id = 2 WHERE id = 1;
pq: foreign key violation: values [1] in columns [id] referenced in table "c"

DELETE competes with another action

This is a bit of a contrived example, but it is possible to have two competing foreign key constraints that affect the same row. In the case where a DELETE CASCADE competes with any other actions, the delete always takes precedence.

SQL
CREATE TABLE a (
  id INT PRIMARY KEY
);
INSERT INTO a VALUES (1), (2);
CREATE TABLE b (
  a_id INT PRIMARY KEY REFERENCES a ON DELETE CASCADE
);
INSERT INTO b VALUES (1);
CREATE TABLE c (
  a_id INT PRIMARY KEY DEFAULT 2 REFERENCES a ON DELETE SET DEFAULT
);
INSERT INTO c VALUES (1);
CREATE TABLE d (
  b_a_id INT REFERENCES b ON DELETE CASCADE
 ,c_a_id INT REFERENCES c ON UPDATE CASCADE
);
INSERT INTO d VALUES (1,1);

SELECT * FROM a;
+----+
| id |
+----+
|  1 |
+----+
|  2 |
+----+
(2 rows)

SELECT * FROM b;
+------+
| a_id |
+------+
|    1 |
+------+
(1 row)

SELECT * FROM c;
+------+
| a_id |
+------+
|    1 |
+------+
(1 row)

SELECT * FROM d;
+--------+--------+
| b_a_id | c_a_id |
+--------+--------+
|      1 |      1 |
+--------+--------+
(1 row)

DELETE FROM a WHERE id = 1;

SELECT * FROM a;
+----+
| id |
+----+
|  2 |
+----+
(1 row)

SELECT * FROM b;
+------+
| a_id |
+------+
+------+
(0 rows)

SELECT * FROM c;
+------+
| a_id |
+------+
|    2 |
+------+
(1 row)

And in table d, the DELETE CASCADE takes precedence and the row is removed.

SQL
SELECT * FROM d;
+--------+--------+
| b_a_id | c_a_id |
+--------+--------+
+--------+--------+
(0 rows)

Self-Referential Cascading Deletes

A self-referential table that can cascade deletes between rows.

SQL
CREATE TABLE a (
  id INT PRIMARY KEY
 ,other_id INT REFERENCES a ON DELETE CASCADE
);
INSERT INTO a VALUES (1, NULL), (2, 1), (3, 2), (4, 3), (5, 1);

SELECT * FROM a;
+----+----------+
| id | other_id |
+----+----------+
|  1 | NULL     |
|  2 |        1 |
|  3 |        2 |
|  4 |        3 |
|  5 |        1 |
+----+----------+
(5 rows)

DELETE FROM a WHERE id = 1;

SELECT * FROM a;
+----+----------+
| id | other_id |
+----+----------+
+----+----------+
(0 rows)

Self-Referential Cascading Deletes with a Cycle

Even without deferrable constraints, it is possible to create a cycle using an UPDATE. When a cycle's references are all DELETE CASCADE, all rows should be deleted.

SQL
CREATE TABLE a (
  id INT PRIMARY KEY
 ,other_id INT REFERENCES a ON DELETE CASCADE
);

INSERT INTO a VALUES (1, NULL), (2, 1), (3, 2), (4, 3);
UPDATE a SET other_id = 4 WHERE id = 1;


SELECT * FROM a;
+----+----------+
| id | other_id |
+----+----------+
|  1 |        4 |
|  2 |        1 |
|  3 |        2 |
|  4 |        3 |
+----+----------+
(4 rows)

Deleting any single row in this cycle should remove all entires.

SQL
DELETE FROM a WHERE id = 1;

SELECT * FROM a;
+----+----------+
| id | other_id |
+----+----------+
+----+----------+
(0 rows)

Cascading Deletes between Two Tables with a Cycle

It's also possible to have two or more tables that reference each other and form a cycle. To create them, a constraint has to be added after both tables are created using the ALTER TABLE ADD CONSTRAINT command. And to create the cycle, again the 'UPDATE' command needs to be used. These should also be handled correctly and all rows should be deleted.

SQL
CREATE TABLE loop_a (
  id INT PRIMARY KEY
 ,b_id INT
 ,INDEX(b_id)
);

CREATE TABLE loop_b (
  id INT PRIMARY KEY
 ,a_id INT REFERENCES loop_a ON DELETE CASCADE
);

ALTER TABLE loop_a ADD CONSTRAINT b_id_delete_constraint
  FOREIGN KEY (b_id) REFERENCE loop_b (id) ON DELETE CASCADE;

INSERT INTO loop_a (id, b_id) VALUES (1, NULL);
INSERT INTO loop_b (id, a_id) VALUES (1, 1);
INSERT INTO loop_a (id, b_id) VALUES (2, 1);
INSERT INTO loop_b (id, a_id) VALUES (2, 2);
INSERT INTO loop_a (id, b_id) VALUES (3, 2);
INSERT INTO loop_b (id, a_id) VALUES (3, 3);
UPDATE loop_a SET b_id = 3 WHERE id = 1;

SELECT * FROM loop_a;
+----+------+
| id | b_id |
+----+------+
|  1 |    3 |
+----+------+
|  2 |    1 |
+----+------+
|  3 |    2 |
+----+------+
(3 rows)

SELECT * FROM loop_b;
+----+------+
| id | a_id |
+----+------+
|  1 |    1 |
+----+------+
|  2 |    2 |
+----+------+
|  3 |    3 |
+----+------+
(3 rows)

Here, any single delete in the cycle should delete the full cycle.

SQL
DELETE FROM loop_a WHERE id = 1;

SELECT * FROM loop_a;
+----+------+
| id | b_id |
+----+------+
+----+------+
(0 rows)

SELECT * FROM loop_b;
+----+------+
| id | a_id |
+----+------+
+----+------+
(0 rows)

Double Self-Referential Cascading Deletes

Similar to the self-referential case above, this one has two self references complicating matters. But these should be handled correctly. In this case, x is the primary key, y references x and z references y. So a delete to x, requires a delete to any values of y matching x's value and then again to delete any values of z that also match.

SQL
CREATE TABLE self_x2 (
  x INT PRIMARY KEY
 ,y INT UNIQUE REFERENCES self_x2(x) ON DELETE CASCADE
 ,z INT REFERENCES self_x2(y) ON DELETE CASCADE
);

INSERT INTO self_x2 (x, y, z) VALUES ('1', NULL, NULL);
INSERT INTO self_x2 (x, y, z) VALUES ('2', '1', NULL);
INSERT INTO self_x2 (x, y, z) VALUES ('3', '2', '1');

SELECT * FROM self_x2;
+---+------+------+
| x |    y |    z |
+---+------+------+
| 1 | NULL | NULL |
+---+------+------+
| 2 |    1 | NULL |
+---+------+------+
| 3 |    2 |    1 |
+---+------+------+
(3 rows)

DELETE FROM self_x2 WHERE x = '1';

SELECT * FROM self_x2;
+---+------+------+
| x |    y |    z |
+---+------+------+
+---+------+------+
(0 rows)

Cascading Delete Race

Dealing with a cycle where there are two competing deletes to a single row should be handle gracefully.

TEXT
       a
      / \
     b   c
     |   |
     |   d
      \ /
       e
SQL
CREATE TABLE race_a (
  id STRING PRIMARY KEY
);
CREATE TABLE race_b (
  id STRING PRIMARY KEY
 ,a_id STRING REFERENCES race_a ON DELETE CASCADE
);
CREATE TABLE race_c (
  id STRING PRIMARY KEY
 ,a_id STRING REFERENCES race_a ON DELETE CASCADE
);
CREATE TABLE race_d (
  id STRING PRIMARY KEY
 ,c_id STRING REFERENCES race_c ON DELETE CASCADE
);
CREATE TABLE race_e (
  id STRING PRIMARY KEY
 ,b_id STRING REFERENCES race_b ON DELETE CASCADE
 ,d_id STRING REFERENCES race_d ON DELETE CASCADE
);

INSERT INTO race_a (id) VALUES ('a1');
INSERT INTO race_b (id, a_id) VALUES ('b1', 'a1');
INSERT INTO race_c (id, a_id) VALUES ('c1', 'a1');
INSERT INTO race_d (id, c_id) VALUES ('d1', 'c1');
INSERT INTO race_e (id, b_id, d_id) VALUES ('e1', 'b1', 'd1');

SELECT * FROM race_a;
+----+
| id |
+----+
| a1 |
+----+
(1 row)

SELECT * FROM race_b;
+----+------+
| id | a_id |
+----+------+
| b1 | a1   |
+----+------+
(1 row)

SELECT * FROM race_c;
+----+------+
| id | a_id |
+----+------+
| c1 | a1   |
+----+------+
(1 row)

SELECT * FROM race_d;
+----+------+
| id | c_id |
+----+------+
| d1 | c1   |
+----+------+
(1 row)

SELECT * FROM race_e;
+----+------+------+
| id | b_id | d_id |
+----+------+------+
| e1 | b1   | d1   |
+----+------+------+
(1 row)

DELETE FROM race_a WHERE id = 'a1';

SELECT * FROM race_a;
+----+
| id |
+----+
+----+
(0 rows)

SELECT * FROM race_e;
+----+------+------+
| id | b_id | d_id |
+----+------+------+
+----+------+------+
(0 rows)

Changes required

Almost all of the changes will be in fk.go, rowwriter.go and a new file, cascader.go, and will need to take into consideration the possibility that we will be adding in deferrable constraints in the future. This does break the general pragmatic rule of code for what you need today not for what you will need tomorrow. So the suggestion here is not to alter the design, but to be cognizant of the implications of deferrable.

One tradeoff to using cascading actions is that they will always be somewhat slow. This is due to the fact that it is not possible to know beforehand if a cascade is actually required. An index row fetch must be performed on all referencing tables to see if the value being updated/deleted exists. For example, a self-referencing table with an ON DELETE CASCADE will have to perform a row fetch for each row being deleted; then again for each row that is cascading the delete; then yet again for each of those new cascading deletes, etc.

Dependency Walking Algorithm

The algorithm to determine which tables and rows are affected by any operation is a graph walk. Each table can be seen as a node in the graph and each foreign key relationship as an edge. What complicates this walk is that the different actions for both ON UPDATE and ON DELETE determine different stopping conditions. This algorithm will be used twice in practice. The first time during planning to determine which table descriptors will need to be fetched, and then again, during execution, when performing the actual operations. The difference between the two, is that when fetching the table descriptors, one must assume that any relationships may be affected, while when actually performing the operations, only a subset of those tables will actually require changes. The main goal of this algorithm is twofold. First to ensure no table is unaccounted for that could potentially produce an orphaned reference. And secondly, to not go too far so such that no table descriptors are fetched which will never be needed for the operation and thus would have no chance to create orphaned rows.

To describe the algorithm, we're going to concentrate on a single step of the recursion, performing a lookup of exactly a single table with an incoming operation. Let's first assume we have the following setup:

text
'NA 'DC  'DS  'UC 'US
  \   \   |   /   /
    \  \  |  /  /
      \ \ | / /
        \\|//
          X
        //|\\
      / / | \ \
    /  /  |  \  \
  /   /   |   \   \
 NA  DC   DS   UC  US

Each element in this is a table. Table X is the table in which the operation will be occurring. The other tables are labeled according to what type of dependency it has to X, or X has to it. All dependencies flow from the top down, so table X references a column in table 'NA. And table NA references a column in table X. All the tables on the first row, the ones that start with a ' have a column that X is referencing and all tables on the 3rd row that don't start with the ' are referencing table X.

All the tables besides X are named based on the type of reference they have to X. Note that in practice, there are two options for each relationship, ON DELETE and ON UPDATE, but for simplicity, we assume that the other operation is a NO ACTION or RESTRICT. The tables are labeled accordingly:

  • NA, 'NA - NO ACTION or RESTRICT
  • DC, 'DC - DELETE CASCADE
  • DS, 'DS - DELETE SET NULL or DELETE SET DEFAULT
  • UC, 'UC - UPDATE CASCADE
  • US, 'US - UPDATE SET NULL or UPDATE SET DEFAULT

So table X has a DELETE CASCADE reference to 'DC and DC has a DELETE CASCADE reference to X. So if a row was deleted in 'DC, X would possibly delete a row and then DC would possibly delete a row.

There are 3 operations that can occur on a value in a table and we will be looking at each one individually.

Cycle detection is important and required here as the referential integrity actions can form circular dependencies. How this is accomplished will be described in the sections describing where this algorithm is applied.

Insert

When inserting a row into table X, all the tables that it references must be fetched. This can be done without worrying what type of relationship it has and the walk will not cascade from those tables.

So we require the fetching of tables 'NA, 'DC, 'DS, 'UC and 'US and we don't required them to be walked any further.

For tables that reference X, inserting will not affect them and their table descriptors do not need to be fetched. Inserts never cascade.

Delete

When deleting a row in table X, the tables that it references do not need to be fetched, as it won't be able to orphan any columns there.

For the tables that reference X, a delete requires the following:

  • NA should be fetched and not walked any further.
  • DC should be fetched and walked as a DELETE operation.
  • DS should be fetched and walked as an UPDATE operation.
  • UC should be fetched and not walked any further.
  • US should be fetched and not walked any further.
Update

When updating a row in X, similar to the insert case, all the tables that table X references are required. Also similar to the insert case, there is also no need for those to be walked any further. So 'NA, 'DC, 'DS, 'UC and 'US will be fetched and not walked.

For the tables that reference X, an update requires the following:

  • NA should be fetched and not walked any further.
  • DC should be fetched and not walked any further.
  • DS should be fetched and not walked any further.
  • UC should be fetched and walked as an UPDATE operation.
  • US should be fetched and walked as an UPDATE operation.

New and Updated Data Structures

Cascader

A new Cascader struct will be added that will coordinate all cascading operations. The Cascader will also act as a cache for RowDeleters, RowUpdaters and RowFetchers. It will contain the following:

  • a map from table IDs to RowDeleters - for cascading deletes - there should only ever be at most one per table accessed during the cascade that requires deletes
  • a map from table IDs to RowFetchers - to be used with RowDeleters - there should be at most one per table accessed during the cascade that requires deletes
  • a map from table IDs to RowUpdaters - for all none deleting cascades - there should only ever be at most one per table accessed during the cascade that requires updates
  • a map from table IDs to RowFetchers - to be used with RowUpdaters - there should be at most one per table accessed during the cascade that requires updates
  • a double map from table IDs to index IDs to RowFetchers - to be used when looking for foreign key references that need to be cascaded - there should be at most one per table per foreign key index accessed during the cascade
  • a map from table IDs to an array of Datums - for deleted row values that need to be checked to ensure no orphaned rows are created - the map will contain one array per table, and the array will contain Datums for every row that was deleted
  • a map from table IDs to an array of Datums - for updated row values that need to be checked to ensure no orphaned rows are created - the map will contain one array per table, and the array will contain Datums for every row that was updated

The in memory size of this table is clearly related to three factors:

  1. the number of tables being cascaded to
  2. the number rows that have been deleted
  3. the number rows that have been updated

Even with the caching all of the RowDeleters, RowUpdaters and RowFetchers, there is still the possibility of running out of memory. To prevent such an eventuality, the transaction's memory monitor will be used for all cached values in the cascader. The most likely source of out of memory issues will be from the list of values that require foreign key checks after the cascade is completed.

Even with these guardrails in place, it is still important to test for both very broad (e.g. 1,000,000 tables all referencing one) and very long (e.g. 10,000,000 self-referential rows in a giant linked list) setups.

RowDeleter and RowUpdater

Both the RowDeleter and RowUpdater will be modified to allow for the addition of a Cascader. The Cascader will only be added on the first deleter or updater that is created and all the internal deleters and updaters that are created as part of the cascading operations will not require one. As there is only one Cascader and it will be included as a pointer, this should have negligible size impact on the row writers.

Algorithmic Changes

Planning
TablesNeededForFKs

fk.go's TablesNeededForFKs function will have to be expanded. Currently, it only fetches the table IDs that need to be checked when an update, delete or insert operation occurs. With the addition of cascading referential constraints, this is no longer sufficient. Instead of just fetching IDs, this function will now also support optionally retrieving the table descriptors themselves. Without the actual table descriptors, it would not be possible to walk the dependency graph and thus impossible to know if there may be yet another cascading action. This will be done iteratively and should be relatively quick. Care will be taken to not slow down non-cascading examples as this codepath is used with all basic operations. It's important to note that in most operations (i.e. non-backfill ones), immediately after calling TablesNeededForFKs lookup all the required table descriptors. The real change here is that more table descriptors will have to be fetched for each operation.

The only augmentation to the function call itself will be to add a passed in function that will perform the actual table descriptor lookup.

The table walk will be accomplished iteratively using a queue and perform the dependency walking algorithm described above. As the final output is a map of table IDs to table descriptors we already have a data structure that can be used for cycle detection. However, we also have to be conscious of the fact that there may be a need to iterate over the same table twice, once with an update and once later with a delete (or vice versa) due to potential dependency cycles within the graph. This adds a slight complication to the map but can be accounted for.

Both memory and runtime of this function will be bounded based on the number of tables that may potentially be part of a series of cascade operations.

Execution
DeleteRow and UpdateRow

Both of rowwriter.go's DeleteRow and UpdateRow will require a small split so that there are different call sites for when a cascade is desired or not. This will have no memory impact but will obviously, have a performance impact due to the cascading call itself.

Cascader's CascadeAll

This is where the bulk of the new functionality will exist. When an operation, be it a delete or an update are executed, they will follow the normal path in their respective row writer. But they will call out to the Cascader. Internally, this means that the first call to DeleteRow or UpdateRow functions will perform their original write operation and then call CascadeAll. However, the Cascader may make additional calls to DeleteRow or UpdateRow on either the same or on new row writers. These additional new writes will not call CascadeAll again. This maintains the separation of concerns such that all foreign key relationship checking will still be done in fk.go; all row writes will still be performed in rowwriter.go; and the coordination of the cascading operations will now be done in cascade.go.

Checking for cascading actions requires a number of steps but can be summarized thusly:

  1. While the queue is not empty:
    1. Pop a table and its changed rows off of the queue.
    2. For each on delete and on update action (these will be two slightly different code paths):
      1. Check the referencing table's foreign key index to see if any rows depend on the keys that were updated or deleted and generate a list of primary keys.
      2. If any do exist, fetch the required columns needed for deleting or updating the rows that have been affected.
      3. If one isn't cached, create a new RowDeleter/RowUpdater specifically without creating a new cascader.
      4. Add each row that has been affected to the map of table IDs to rows to check later for referential integrity violations.
      5. Call DeleteRow/UpdateRow for each change (without triggering another cascade call).
      6. Enqueue the deleted/updated rows and table ID.
  2. Call each table's RowDeleter and RowUpdater foreign key constraint checker for all rows that have been updated to ensure no orphans were created.

Of course, there are memory concerns that must be addressed. The innermost loop may return too many rows and could cause out of memory errors. This can be alleviated by chunking the results of the first row fetcher and also using the transaction's memory monitor for all values being returned from that loop. Also, the memory monitor will be used for the queue in the CascadeAll function. This will prevent the queue from growing too large, which can be caused by a graph that is both deep and wide.

It's important to note that there are two main operations that can occur. A Delete can only be triggered from a DELETE CASCADE action. All other actions will perform an Update instead.

Cycle detection is done by performing all delete and update operations in a batch and by delaying the checking for foreign key constraints violations until after all operations are complete. If there is a cycle, the first delete or update, which would leave an orphaned row in isolation, is performed blindly, as are all subsequent cascading operations. Once all the operations are complete, all referential integrity constraints that were touched are then checked. This has the disadvantage that a long running cascade operation that ultimately fails will leave around a large number of intents due to the failed transaction. But it does leverage our transaction mechanism itself to store these changes and does not require large amounts of memory.

The runtime of this function is heavily dependent on the number of cascading operations that will be performed. An further still on each of those operations' runtimes. Each non-cascading operation, one that doesn't return rows to cascade to, requires one row fetcher operation. Each operation that does perform a cascading operations requires two row fetchers, the addition time required to performing the actual delete or update and finally the checking for orphaned rows.

Future Optimizations

These optimizations are optional and will be explored after all basic operations are working.

Interleaved Tables

Since one of our primary use cases will be interleaved tables, special attention should be paid to optimizing cascading deletes and updates. Interleaved child tables that are isolated, as in none of the child tables have any external referential dependencies, and complete, all child tables reference the parent table, can be done as a batch delete and should be quick. Similarly, there may be a possibility for similar batching when upgrading.

Self-Referencing Tables

There may be some quick optimizations for self-referencing tables. Similar to interleaved tables, it may be possible to perform batch deletes or updates. These will be explored after all basic functionality is working.

Plan of Engineering

To begin with, the pre-fetching of all required table descriptors will be added to TablesNeededForFKs. After this is complete, each of the new actions will be implemented in order of importance, which is roughly:

  1. ON DELETE CASCADE
  2. ON UPDATE CASCADE
  3. ON DELETE SET NULL
  4. ON UPDATE SET NULL
  5. ON DELETE SET DEFAULT
  6. ON UPDATE SET DEFAULT

The following rationale were used to determine the order of work:

  • ON DELETE CASCADE is the most common and should help with ORM compatibility and is the most requested so it is placed at the top of the list.
  • ON DELETE SET NULL ON UPDATE CASCADE is the default setting for Sequelize see #17803.
  • Cascading is extremely useful with interleaved tables, so ON UPDATE CASCADE should be high on this list.
  • SET DEFAULT adds more failure modes than SET NULL and is less common in other SQL implementations, so SET NULL will be implemented next.
  • If complications arise during implementation or other business needs preempt further work on this feature, SET DEFAULT is the best candidate for cutting from the next release.

Testing

All of the above examples and some more complex ones will be added in a logic test style manner. Furthermore, the dependency walking algorithm, when used for both fetching tables and actually performing operations will be tested in isolation using Go's unit testing framework directly.

It might also be useful to create a new type of test that specifically address testing these new relationships. This would be in addition to logic tests and use a new DSL. This testing is in scope for the implementation of RFC and will be added early on to facilitate better testing of all operations. However, the details of the tests themselves will be determined during the implementation.

It will also be useful and informative to test the limits of our cascading operations and compare us to Postgres. Can we handle 1,000,000 cascades? What if there were in a single table or multiple tables? Do we need to artificially limit the size of the cascades or will the our intent limit do that for us? These tests should focus on these 3 variables:

  1. the number of rows referencing each foreign key
  2. the number of tables referencing each foreign key
  3. the number of chained references

These tests will be performed using some basic scripts that will be checked in for repeatability.

Benchmarks

It will be important to see if the addition of cascading actions affect any of our current update and delete benchmarks.

Furthermore, a few new benchmarks will also be added to monitor normal use cases:

  • 2, 3, 5, 10 table with n rows per table on delete cascade
  • 2, 3, 5, 10 table with n rows per table on update cascade
  • 2, 3, 5, 10 interleaved tables with n rows per table on delete cascade

Rollout

This is a high level, and subject to change, list of the main PRs

  1. ON DELETE CASCADE
  2. Add relationship tester. All subsequent updates will include new relationship tests.
  3. ON UPDATE CASCADE
  4. ON DELETE SET NULL
  5. ON UPDATE SET NULL
  6. ON DELETE SET DEFAULT
  7. ON UPDATE SET DEFAULT
  8. Scripts to test our limits, try to force OOM errors
  9. Scripts to compare us to Postgres
  10. (optional) Interleaved table optimizations

Drawbacks

Without adding in deferrable at this time, there will be some work that must be done when deferrable constraints and transaction settings are added.

Future Work

  • A SHOW RELATIONS sql command that would output the full graph of relations for a table would be quite helpful.

Rationale and Alternatives

Not supporting cascade is not really an option, but supporting Postgres' full range of cascading is not a requirement. The most common use cases do not require self-referencing and circular cascading of the constraints. There is no requirement to add deferrable constraints and we may never do so.

So the alternatives here are to follow MySQL or SQL Server's style of dealing with cascading referential integrity constraints.

MySQL

If we implement MySQL's deferred style, this would greatly simplify the engineering by not having to ever defer constraint enforcement, but instead turn it off temporarily. This seems compelling but would allow the database state to become quite broken. MySQL gets around this by never checking constraints that aren't updated but this seems lazy and may cause strange failures in the long run. If we do choose this alternative, it would be imperative that we also add a command to check the constraints after turning the constraint checking back on. And we would also have to have some type of warning if we were to allow turning off constraint checking outside the scope of a session.

SQL Server

A different approach that would also not include deferring of constraints and would actually scope this project down significantly would be to follow SQL Server's rule that all cascading constraints must form a tree of tables and not a graph. So there can be no loop backs and does not allow self-referential tables. The self-referencing example above would no longer be valid.

However, in lieu of the deferring of constraints, it is commonly suggested that the use of common table expressions (CTE) should be used in its place. And that they are sufficient when dealing with self-referencing or circular dependencies. CTEs are a significantly larger project than adding in deferrable constraints and thus it may be a long time before we are able to support those interesting by still rare cases.

Unresolved questions

  • Do we want to limit the amount of cascading (assuming unlimited memory)? The TxnCoordSender already limits the number of intents in a tractions, but this might not be enough on its own. This will require some testing.
  • Should we implement deferring of constraints? When would be the right time to do so?
  • Is there a reasonable use case for constraints from multiple tables that can't be covered with other constraints?
  • Should the default for interleaved tables be ON DELETE CASCADE and/or ON UPDATE CASCADE.