docs/RFCS/20171201_cascade.md
Being able to cascade deletes through foreign key references is a very common
activity and part of the SQL standard. This RFC proposes to add in the ON DELETE action and ON UPDATE action keywords when defining the foreign key
references on a CREATE TABLE or ALTER TABLE command.
This RFC proposes the addition of the following optional actions to referential
integrity constraints within a CREATE TABLE's REFERENCES and FOREIGN KEY
columns:
ON DELETE CASCADEON UPDATE CASCADEON DELETE SET DEFAULTON UPDATE SET DEFAULTON DELETE SET NULLON UPDATE SET NULLWhile not essential, these actions, specifically ON DELETE CASCADE, have been
requested multiple times by both the community and potential customers.
The addition of these optional actions will unlock more ORM functionality as cascade is a very common option when defining foreign key references and bring us closer to parity with Postgres.
Note that this RFC will only be tangentially discussing deferring or disabling
the enforcement of foreign key constraints. So what is not covered includes all
of the deferrable options on constraints: DEFERRED NOT DEFERRABLE, DEFERRED INITIALLY IMMEDIATE, DEFERRED INITIALLY DEFERRED. This rfc also excludes
setting the transaction to defer certain constraints and not using the commands
SET CONSTRAINTS ALL/some_named_constraint DEFERRED or SET CONSTRAINTS ALL/some_named_constraint IMMEDIATE.
Futhermore, this RFC will not be covering different techniques for matching
composite foreign keys using the MATCH keyword. However, there will be some
discussion around these concepts as they will have an impact in the future if we
choose to implement them.
At this time there is no plan to allow multiple tables to be referenced by the same key while this is allowed in Postgres. An example of how this works is shared in the reference-level explanation.
While cascading referential integrity constraints are in the SQL standard, they are not essential as they can be seen as shortcuts to writing longer SQL transactions. However, they have been an often requested feature and are quite often used in ORMs. What is clear, is that the addition of these options will unlock new customer opportunities for us.
The use of ON DELETE/UPDATE CASCADE in combination with interleaved tables is
a great fit and can be used as a great introduction to interleaving.
A table constraint is a check on a table that ensures that all rows in that table must pass before being committed. These can include simple checks, such as ensuring a value is greater than zero, to more complex ones such as ensuring the value in a column is unique. See here for our docs on the subject. The constraints we're specifically interested in for this RFC are foreign keys references.
A referential integrity constraint is the technical term for a foreign key constraint. They require that the key in a column is present in another table. Here are our docs about them.
When updating or deleting a row in a table that has been used as a foreign key in another table, it is also important to update the table that's referencing this key. When declaring a foreign key constraint, it is possible to specify what action should occur when the referenced key is altered or deleted.
ON DELETE and ON UPDATE actions?There are optional actions that will occur when a referenced foreign key is
removed (ON DELETE) or when a key is updated (ON UPDATE). All foreign key
relationships already have default actions for both of these, and that is NO ACTION.
NO ACTIONIf the foreign key being removed is still referenced in another table, fail the
transaction. NO ACTION is poorly named and should be read as the answer to:
"What action should be taken when the reference is changed?". Take no action
and fail the transaction.
RESTRICTRESTRICT is exactly the same as NO ACTION in our current implementation.
When used in Postgres, it always immediately checks and fails if there is any
row referencing the foreign key that was updated or deleted. Unlike NO ACTION
which waits until the end of the transaction if constraint checks have been
deferred. This action has been added and stored in the ForeignKeyReference for
future use for when we opt to add the ability to defer constraint checking.
CASCADEThis is the most common use case. When the CASCADE action is specified, if the
referenced foreign key is deleted or altered, then respectively delete the row
or alter the key in the referencing table. The name cascade comes from the fact
that the change made on the referenced table cascades into the referencing
table. It is not uncommon for cascading deletes to walk through numerous tables.
SET NULLWhen the referencing key is deleted or altered this option sets the foreign key
reference column to null. Note that this will not work on columns that also have
the NOT NULL constraint. The creation of the constraint should fail for a
column that has both NOT NULL and a SET NULL action.
SET DEFAULTWhen the referencing key is deleted or altered this option sets the foreign key
referencing column to the default value for the column. If the default for the
column is null, then this is the equivalent of SET NULL. Note that if this
default is not null, then there must still be a referenced key that the default
must match or the transaction will fail.
Deferring constraints delays the checking and enforcement of the constraints until the end of a transaction instead of immediately failing when violated. The most commonly cited use case is cyclical foreign keys references but there are a number of instances where this can be very helpful to defer constraint checking.
Here is an excellent source that explains the different ways that Postgres handles deferrable constraints, the most common use cases for deferring and the pros and cons of doing so.
While we are not going to be deferring constraints at this time, we need be aware that this will be something that we most likely will be adding in the near future.
A composite foreign key is a referential integrity constraint that is created from more than one source column on another table. We do support composite foreign keys and all added operations will need to take foreign key matching into account.
MATCH FULL considers a foreign key as valid if every foreign key column is
null or if all foreign key columns are not null and match the referenced
table.MATCH PARTIAL is valid if at least one Foreign key column is null and the
rest of the not null columns match those of the referenced tables or all
foreign key columns are not null and match the referenced table.MATCH SIMPLE considers a foreign key valid if all columns are not null and
match the referenced key exactly or if one of the foreign key values is null.We currently only offer MATCH FULL, while Postgres' and SQL Server's default
is MATCH SIMPLE. There are no plans to support the other match types at this
time. However, the addition of the other composite key matching types will
affect when to trigger an ON DELETE actions.
All major databases support the ON DELETE action and ON UPDATE action
options. But there are some minor differences in which options they allow. This
section also includes the differences for the MATCH options and
DEFERRED/IMMEDIATE constraint checking as they may be useful in the future.
Postgres is by far the most nuanced database allowing more options than the others. Postgres supports 5 different actions:
CASCADE: delete or update the reference when the referenced key is removedSET NULL: set the foreign key to null if the key is deleted or updated.SET DEFAULT: set the foreign key to its default value if the key is deleted
or updated.RESTRICT: immediately fail if there are any existing references to the key
being updated or deleted.NO ACTION: The default option. Fail if there are any existing references to
the key being updated or deleted, but allow this check to be deferred until
the end of the transaction.Postgres also offers two options around matching:
MATCH FULL: No part of a foreign key can be null, unless they are all null.MATCH SIMPLE: The default behaviour. Nulls are allowed.Postgres also offers deferred constraint checking using the DEFERRED and
IMMEDIATE keywords.
Here is the main reference for Postgres.
MS SQL Server offers a similar set of options to Postgres, but does not offer
the RESTRICT action. However, since they do not offer a DEFERRED at all, NO ACTION is the equivalent of RESTRICT. To get around this restrictive mode,
the suggested work around is to allow the columns to be null.
Futhermore, the cascading of the referential integrity constraints is also limited to a single instance of each table and there can be no loops. See this article for more info.
MS SQL Server has full support for all types composite key matching.
Here is main reference for SQL Server.
MySQL actually offers all the actions that postgres does, but NO ACTION is an
alias for RESTRICT. There is no option to defer the checking of the
constraints and the DEFERRED and IMMEDIATE keywords are not valid. If one
wants to make adjustments and work around the constraints, the system variable
foreign_key_checks
can be set to 0 which turns off constraint checking within a session. But it
is important to note that turning it back on by setting it to 1 does not then
go back and check the constraints of the altered rows.
For composite keys, there is only a single matching option which is the
equivalent of Postgres' MATCH SIMPLE. Including the MATCH keyword is
accepted but based on the documentation broken and will potentially nullify any
cascading action.
Here is the main reference for MySQL.
Oracle only allows 3 options for cascading. The default value, which is
specifically not specifying any action, is the equivalent to NO ACTION. It
also has SET NULL and CASCADE. Oracle has a full suite of options for
deferring constraint checking, including the keywords DEFERRED and
IMMEDIATE. It also has the option to permanently disable a constraint using
the DISABLE keyword. More details on disabling can be found
here.
Oracle offer three levels of composite key matching:
MATCH FULL Partially null foreign keys are not permitted. Either all
components of the foreign key must be null, or the combination of values
contained in the foreign key must appear as the primary or unique key value of
a single row of the referenced table.MATCH PARTIAL Partially null composite foreign keys are permitted. Either
all components of the foreign key must be null, or the combination of non-null
values contained in the foreign key must appear in the corresponding portion
of the primary or unique key value of a single row in the referenced table.MATCH NONE Partially null composite foreign keys are permitted. If any
column of a composite foreign key is null, then the non-null portions of the
key do not have to match any corresponding portion of a parent key.Here is the main reference for Oracle.
Spanner offers ON DELETE CASCADE only for interleaved tables. They actually
suggest using it just about every time. This is a great example and a perfect
use case that also help explain how interleaved tables function. We should
follow their lead here and suggest the same thing.
See here for their documentation on it. This is very good documentation on interleaved tables and we should produce a similarly robust page.
We don't currently support foreign key references from multiple tables, but it would be possible to add this at some point in the future. Postgres does support them.
To further elucidate the point, instead of describing a foreign key reference from one table to another, it is possible to add a reference to the same keys in multiple tables. To see how this works, consider the following:
Table A has an id column. Table B contains a foreign key reference to table_a.id. Let's call it table_b.a_id. Table C contains a foreign key reference to table_a.id. Let's also call it table_c.a_id. Table D contains a foreign key reference to both table_b.a_id and table_c.a_id.
This means that to enter a value into table d, it must exist in both tables b and c. If it is only one, the constraint will be violated.
It should be noted, that this is the equivalent of having multiple foreign key references and an extra constraint that requires that the values of those referenced values are equal. However this will produce a column for each reference instead of a single column representing this complex relationship.
This RFC proposes to add in all the missing functionality and keywords from Postgres to allow for cascading referential integrity constraints.
Here is a detailed list of all of the proposed foreign key reference actions:
ON DELETE CASCADE When a referenced foreign key is deleted, all rows
referencing that key are deleted. If there are other alterations to the row,
such as a SET NULL or SET DEFAULT, the delete with take precedence.
ON UPDATE CASCADE When a referenced foreign key is updated, update the
columns of all rows referencing that key to the new value.
ON DELETE SET DEFAULT When a referenced foreign key is deleted, set the
columns of all rows referencing that key to the default value for that column.
If the default value for the column is null, this will have the same effect as
ON DELETE SET NULL. The default value must still conform with all other
constraints, such as UNIQUE.
ON UPDATE SET DEFAULT When a referenced foreign key is updated, set the
columns of all rows referencing that key to the default value for that column.
If the default value for the column is null, this will have the same effect as
ON UPDATE SET NULL. The default value must still conform with all other
constraints, such as UNIQUE.
ON DELETE SET NULL When a referenced foreign key is deleted, set the columns
of all rows referencing that key to null. The column must allow nulls or this
update will still fail.
ON UPDATE SET NULL When a referenced foreign key is updated, set the columns
of all rows referencing that key to null. The column must allow nulls or this
update will still fail.
Please note that all foreign key references can have both an ON DELETE and an
ON UPDATE and that their actions are not required to be the same.
For all examples, the error codes generated are a combination of both Cockroach and Postgres errors. They will be refined during implementation.
For all of the following examples, assume we have the following setup:
CREATE TABLE a (
id INT PRIMARY KEY
);
INSERT INTO a (id) VALUES (1), (2), (3), (4), (5), (6), (7), (8), (100);
CREATE TABLE b (
delete_restrict INT NOT NULL REFERENCES a ON DELETE RESTRICT
,update_restrict INT NOT NULL REFERENCES a ON UPDATE RESTRICT
,delete_cascade INT NOT NULL REFERENCES a ON DELETE CASCADE
,update_cascade INT NOT NULL REFERENCES a ON UPDATE CASCADE
,delete_null INT REFERENCES a ON DELETE SET NULL
,update_null INT REFERENCES a ON UPDATE SET NULL
,delete_default INT DEFAULT 100 REFERENCES a ON DELETE SET DEFAULT
,update_default INT DEFAULT 100 REFERENCES a ON DELETE SET DEFAULT
);
INSERT INTO b VALUES (1, 2, 3, 4, 5, 6, 7, 8);
And we expect:
SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON DELETE NO ACTION or ON DELETE RESTRICTWhen we try to delete a referenced key that does not specify a delete action or
specifies it to RESTRICT, the operation fails.
DELETE FROM a WHERE id = 1;
pq: foreign key violation: values [1] in columns [id] referenced in table "b"
ON UPDATE NO ACTION or ON UPDATE RESTRICTWhen we try to update a referenced key that does not specify an update action or
specifies it to RESTRICT, the operation fails.
UPDATE a SET id = 7 WHERE id = 2;
pq: foreign key violation: values [1] in columns [id] referenced in table "b"
ON DELETE CASCADEWhen the reference key is deleted, we remove a referenced key and the referencing row is deleted.
DELETE FROM a WHERE id = 3;
SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(0 rows)
ON UPDATE CASCADEWhen the referenced key is updated, the referencing key is updated to the same value.
UPDATE a SET id = 100 WHERE id = 4;
SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| 1 | 2 | 3 | 100 | 5 | 6 | 7 | 8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON DELETE SET NULLWhen the referenced key is deleted, the referencing key is set to null.
DELETE FROM a WHERE id = 5;
SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| 1 | 2 | 3 | 4 | NULL | 6 | 7 | 8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON UPDATE SET NULLWhen the referenced key is updated, the referencing key is set to null.
UPDATE a SET id = 100 WHERE id = 6;
SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| 1 | 2 | 3 | 4 | 5 | NULL | 7 | 8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON DELETE SET DEFAULTWhen the referenced key is deleted, the referencing key is set to the default value for that column.
DELETE FROM a WHERE id = 7;
SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| 1 | 2 | 3 | 4 | 6 | 6 | 100 | 8 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON UPDATE SET DEFAULTWhen the referenced key is updated, the referencing key is set to the default value for that column.
UPDATE a SET id = 100 WHERE id = 8;
SELECT * FROM b;
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| delete_restrict | update_restrict | delete_cascade | update_cascade | delete_null | update_null | delete_default | update_default |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 100 |
+-----------------+-----------------+----------------+----------------+-------------+-------------+----------------+----------------+
(1 row)
ON UPDATE/DELETE SET DEFAULT default does not exist in referenced tableWhen the referenced key is updated or deleted and the referencing key's action
is SET DEFAULT, the referencing key's value should bet set to its default
value. However, there is no guarantee that that default value exists in the
referenced table. When it does not the operation should fail.
DELETE FROM a WHERE id = 0;
DELETE FROM a where id = 7;
pq: foreign key violation: Key (delete_default)=(0) is not present in table "a";
UPDATE a SET id = 100 WHERE id = 8;
pq: foreign key violation: Key (delete_default)=(0) is not present in table "a";
ON UPDATE/DELETE SET NULL on a NOT NULL columnWe should not allow a SET NULL action on a column that does not allow nulls.
We do not currently allow the altering of constraints so this can be checked at
the time of the creation of the constraint.
CREATE TABLE c (
delete_not_nullable INT NOT NULL REFERENCES a ON DELETE SET NULL
);
pg: cannot create a constraint for column "delete_not_nullable" to ON DELETE SET NULL that also has a NOT NULL constraint.
CREATE TABLE c (
update_not_nullable INT NOT NULL REFERENCES a ON UPDATE SET NULL
);
pg: cannot create a constraint for column "update_not_nullable" to ON UPDATE SET NULL that also has a NOT NULL constraint.
ON UPDATE/DELETE SET DEFAULT with no default valueSimilarly, we should not allow a SET DEFAULT action on a column that does not
have a default value. Again, we do not currently allow the altering of
constraints so this can be checked at the time of the creation of the
constraint.
CREATE TABLE c (
delete_no_default INT REFERENCES a ON DELETE SET DEFAULT
);
pg: cannot create a constraint for column "delete_no_default" to ON DELETE SET DEFAULT that has no default value.
CREATE TABLE c (
update_no_default INT REFERENCES a ON update SET DEFAULT
);
pg: cannot create a constraint for column "update_no_default" to ON UPDATE SET DEFAULT that has no default value.
ON UPDATE CASCADE violates another constraintIn this example, the second constraint on table c, the check that update_check
is less than 100, means that it is possible to bump up against that new
constraint when cascading an update from table a. In essence, by adding a
constraint on the referencing column in table c that has an ON UPDATE CASCADE
action, it is also adding that same constraint on table a, but only for any keys
of a that exist in table c.
CREATE TABLE c (
update_check INT REFERENCES a ON UPDATE CASCADE
,CONSTRAINT update_check CHECK (update_check < 100)
);
INSERT INTO c VALUES (1);
SELECT * FROM c;
+--------------+
| update_check |
+--------------+
| 1 |
+--------------+
(1 row)
UPDATE a SET id = 100 WHERE id = 1;
pg: update or delete on table "a" violates foreign key constraint "b_delete_restrict_fkey" on table "b"
detail: Key (id)=(1) is still referenced from table "b".
ON UPDATE/DELETE SET NULL violates another constraintIn a similar case, the ON UPDATE SET NULL or ON DELETE SET NULL can block
updates to original table.
CREATE TABLE c (
update_unique INT UNIQUE REFERENCES a ON UPDATE SET NULL
,delete_unique INT UNIQUE REFERENCES a ON DELETE SET NULL
);
INSERT INTO c VALUES (1,1), (null, null);
SELECT * FROM c;
+---------------+---------------+
| update_unique | delete_unique +
+---------------+---------------+
| 1 | 1 |
+---------------+---------------+
| NULL | NULL |
+---------------+---------------+
(2 rows)
UPDATE a SET id = 100 WHERE id = 1;
pg: update or delete on table "a" violates foreign key constraint "b_delete_restrict_fkey" on table "b"
detail: Key (id)=(1) is still referenced from table "b".
DELETE FROM a WHERE id = 1;
pg: update or delete on table "a" violates foreign key constraint "c_update_unique_fkey" on table "c"
detail: Key (id)=(1) is still referenced from table "c".
Here are the two canonical examples of cascading referential integrity constraints.
ON DELETE CASCADEWhen a row in table a is deleted, all rows referencing that row in table b are deleted. Then all rows referencing the deleted row in table b are deleted.
CREATE TABLE a (
id INT PRIMARY KEY
);
INSERT INTO a VALUES (1);
CREATE TABLE b (
id INT PRIMARY KEY
,a_id INT REFERENCES a ON DELETE CASCADE
);
INSERT INTO b VALUES (1,1);
CREATE TABLE c (
b_id INT REFERENCES b ON DELETE CASCADE
);
INSERT INTO c VALUES (1);
SELECT * FROM a;
+----+
| id |
+----+
| 1 |
+----+
(1 row)
SELECT * FROM b;
+----+------+
| id | a_id |
+----+------+
| 1 | 1 |
+----+------+
(1 row)
SELECT * FROM c;
+------+
| b_id |
+------+
| 1 |
+------+
(1 row)
DELETE FROM a WHERE id = 1;
SELECT * FROM a;
+----+
| id |
+----+
+----+
(0 rows)
SELECT * FROM b;
+----+------+
| id | a_id |
+----+------+
+----+------+
(0 rows)
SELECT * FROM c;
+--------+
| b_a_id |
+--------+
+--------+
(0 rows)
ON DELETE CASCADE to a RESTRICTWhen a row in table a is deleted, all rows referencing that row in table b are
deleted. Then all rows referencing the deleted row in table b can fail due to
the deeper RESTRICT action.
CREATE TABLE a (
id INT PRIMARY KEY
);
INSERT INTO a VALUES (1);
CREATE TABLE b (
id INT PRIMARY KEY
,a_id INT REFERENCES a ON DELETE CASCADE
);
INSERT INTO b VALUES (1,1);
CREATE TABLE c (
b_id INT REFERENCES b ON DELETE RESTRICT
);
INSERT INTO c VALUES (1);
DELETE FROM a WHERE id = 1;
pq: foreign key violation: values [1] in columns [id] referenced in table "c"
ON UPDATE CASCADEWhen the key is updated in table a, its new value is cascaded to table b, which in turn gets cascaded to table c.
CREATE TABLE a (
id INT PRIMARY KEY
);
INSERT INTO a VALUES (1);
CREATE TABLE b (
a_id INT PRIMARY KEY REFERENCES a ON UPDATE CASCADE
);
INSERT INTO b VALUES (1);
CREATE TABLE c (
b_a_id INT REFERENCES b ON UPDATE CASCADE
);
INSERT INTO c VALUES (1);
SELECT * FROM a
INNER JOIN b ON a.id = b.a_id
INNER JOIN c ON b.a_id = c.b_a_id;
+----+------+--------+
| id | a_id | b_a_id |
+----+------+--------+
| 1 | 1 | 1 |
+----+------+--------+
(1 row)
UPDATE a SET id = 2 WHERE id = 1;
SELECT * FROM a
INNER JOIN b ON a.id = b.a_id
INNER JOIN c ON b.a_id = c.b_a_id;
+----+------+--------+
| id | a_id | b_a_id |
+----+------+--------+
| 2 | 2 | 2 |
+----+------+--------+
(1 row)
ON UPDATE CASCADE to a RESTRICTWhen the key is updated in table a, its new value is cascaded to table b, which
in turn fails due to the RESTRICT action on table c.
CREATE TABLE a (
id INT PRIMARY KEY
);
INSERT INTO a VALUES (1);
CREATE TABLE b (
a_id INT PRIMARY KEY REFERENCES a ON UPDATE CASCADE
);
INSERT INTO b VALUES (1);
CREATE TABLE c (
b_a_id INT REFERENCES b ON UPDATE RESTRICT
);
INSERT INTO c VALUES (1);
UPDATE a SET id = 2 WHERE id = 1;
pq: foreign key violation: values [1] in columns [id] referenced in table "c"
DELETE competes with another actionThis is a bit of a contrived example, but it is possible to have two competing
foreign key constraints that affect the same row. In the case where a DELETE CASCADE competes with any other actions, the delete always takes precedence.
CREATE TABLE a (
id INT PRIMARY KEY
);
INSERT INTO a VALUES (1), (2);
CREATE TABLE b (
a_id INT PRIMARY KEY REFERENCES a ON DELETE CASCADE
);
INSERT INTO b VALUES (1);
CREATE TABLE c (
a_id INT PRIMARY KEY DEFAULT 2 REFERENCES a ON DELETE SET DEFAULT
);
INSERT INTO c VALUES (1);
CREATE TABLE d (
b_a_id INT REFERENCES b ON DELETE CASCADE
,c_a_id INT REFERENCES c ON UPDATE CASCADE
);
INSERT INTO d VALUES (1,1);
SELECT * FROM a;
+----+
| id |
+----+
| 1 |
+----+
| 2 |
+----+
(2 rows)
SELECT * FROM b;
+------+
| a_id |
+------+
| 1 |
+------+
(1 row)
SELECT * FROM c;
+------+
| a_id |
+------+
| 1 |
+------+
(1 row)
SELECT * FROM d;
+--------+--------+
| b_a_id | c_a_id |
+--------+--------+
| 1 | 1 |
+--------+--------+
(1 row)
DELETE FROM a WHERE id = 1;
SELECT * FROM a;
+----+
| id |
+----+
| 2 |
+----+
(1 row)
SELECT * FROM b;
+------+
| a_id |
+------+
+------+
(0 rows)
SELECT * FROM c;
+------+
| a_id |
+------+
| 2 |
+------+
(1 row)
And in table d, the DELETE CASCADE takes precedence and the row is removed.
SELECT * FROM d;
+--------+--------+
| b_a_id | c_a_id |
+--------+--------+
+--------+--------+
(0 rows)
A self-referential table that can cascade deletes between rows.
CREATE TABLE a (
id INT PRIMARY KEY
,other_id INT REFERENCES a ON DELETE CASCADE
);
INSERT INTO a VALUES (1, NULL), (2, 1), (3, 2), (4, 3), (5, 1);
SELECT * FROM a;
+----+----------+
| id | other_id |
+----+----------+
| 1 | NULL |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 1 |
+----+----------+
(5 rows)
DELETE FROM a WHERE id = 1;
SELECT * FROM a;
+----+----------+
| id | other_id |
+----+----------+
+----+----------+
(0 rows)
Even without deferrable constraints, it is possible to create a cycle using an
UPDATE. When a cycle's references are all DELETE CASCADE, all rows should be
deleted.
CREATE TABLE a (
id INT PRIMARY KEY
,other_id INT REFERENCES a ON DELETE CASCADE
);
INSERT INTO a VALUES (1, NULL), (2, 1), (3, 2), (4, 3);
UPDATE a SET other_id = 4 WHERE id = 1;
SELECT * FROM a;
+----+----------+
| id | other_id |
+----+----------+
| 1 | 4 |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
+----+----------+
(4 rows)
Deleting any single row in this cycle should remove all entires.
DELETE FROM a WHERE id = 1;
SELECT * FROM a;
+----+----------+
| id | other_id |
+----+----------+
+----+----------+
(0 rows)
It's also possible to have two or more tables that reference each other and form
a cycle. To create them, a constraint has to be added after both tables are
created using the ALTER TABLE ADD CONSTRAINT command. And to create the cycle,
again the 'UPDATE' command needs to be used. These should also be handled
correctly and all rows should be deleted.
CREATE TABLE loop_a (
id INT PRIMARY KEY
,b_id INT
,INDEX(b_id)
);
CREATE TABLE loop_b (
id INT PRIMARY KEY
,a_id INT REFERENCES loop_a ON DELETE CASCADE
);
ALTER TABLE loop_a ADD CONSTRAINT b_id_delete_constraint
FOREIGN KEY (b_id) REFERENCE loop_b (id) ON DELETE CASCADE;
INSERT INTO loop_a (id, b_id) VALUES (1, NULL);
INSERT INTO loop_b (id, a_id) VALUES (1, 1);
INSERT INTO loop_a (id, b_id) VALUES (2, 1);
INSERT INTO loop_b (id, a_id) VALUES (2, 2);
INSERT INTO loop_a (id, b_id) VALUES (3, 2);
INSERT INTO loop_b (id, a_id) VALUES (3, 3);
UPDATE loop_a SET b_id = 3 WHERE id = 1;
SELECT * FROM loop_a;
+----+------+
| id | b_id |
+----+------+
| 1 | 3 |
+----+------+
| 2 | 1 |
+----+------+
| 3 | 2 |
+----+------+
(3 rows)
SELECT * FROM loop_b;
+----+------+
| id | a_id |
+----+------+
| 1 | 1 |
+----+------+
| 2 | 2 |
+----+------+
| 3 | 3 |
+----+------+
(3 rows)
Here, any single delete in the cycle should delete the full cycle.
DELETE FROM loop_a WHERE id = 1;
SELECT * FROM loop_a;
+----+------+
| id | b_id |
+----+------+
+----+------+
(0 rows)
SELECT * FROM loop_b;
+----+------+
| id | a_id |
+----+------+
+----+------+
(0 rows)
Similar to the self-referential case above, this one has two self references
complicating matters. But these should be handled correctly. In this case, x
is the primary key, y references x and z references y. So a delete to
x, requires a delete to any values of y matching x's value and then again
to delete any values of z that also match.
CREATE TABLE self_x2 (
x INT PRIMARY KEY
,y INT UNIQUE REFERENCES self_x2(x) ON DELETE CASCADE
,z INT REFERENCES self_x2(y) ON DELETE CASCADE
);
INSERT INTO self_x2 (x, y, z) VALUES ('1', NULL, NULL);
INSERT INTO self_x2 (x, y, z) VALUES ('2', '1', NULL);
INSERT INTO self_x2 (x, y, z) VALUES ('3', '2', '1');
SELECT * FROM self_x2;
+---+------+------+
| x | y | z |
+---+------+------+
| 1 | NULL | NULL |
+---+------+------+
| 2 | 1 | NULL |
+---+------+------+
| 3 | 2 | 1 |
+---+------+------+
(3 rows)
DELETE FROM self_x2 WHERE x = '1';
SELECT * FROM self_x2;
+---+------+------+
| x | y | z |
+---+------+------+
+---+------+------+
(0 rows)
Dealing with a cycle where there are two competing deletes to a single row should be handle gracefully.
a
/ \
b c
| |
| d
\ /
e
CREATE TABLE race_a (
id STRING PRIMARY KEY
);
CREATE TABLE race_b (
id STRING PRIMARY KEY
,a_id STRING REFERENCES race_a ON DELETE CASCADE
);
CREATE TABLE race_c (
id STRING PRIMARY KEY
,a_id STRING REFERENCES race_a ON DELETE CASCADE
);
CREATE TABLE race_d (
id STRING PRIMARY KEY
,c_id STRING REFERENCES race_c ON DELETE CASCADE
);
CREATE TABLE race_e (
id STRING PRIMARY KEY
,b_id STRING REFERENCES race_b ON DELETE CASCADE
,d_id STRING REFERENCES race_d ON DELETE CASCADE
);
INSERT INTO race_a (id) VALUES ('a1');
INSERT INTO race_b (id, a_id) VALUES ('b1', 'a1');
INSERT INTO race_c (id, a_id) VALUES ('c1', 'a1');
INSERT INTO race_d (id, c_id) VALUES ('d1', 'c1');
INSERT INTO race_e (id, b_id, d_id) VALUES ('e1', 'b1', 'd1');
SELECT * FROM race_a;
+----+
| id |
+----+
| a1 |
+----+
(1 row)
SELECT * FROM race_b;
+----+------+
| id | a_id |
+----+------+
| b1 | a1 |
+----+------+
(1 row)
SELECT * FROM race_c;
+----+------+
| id | a_id |
+----+------+
| c1 | a1 |
+----+------+
(1 row)
SELECT * FROM race_d;
+----+------+
| id | c_id |
+----+------+
| d1 | c1 |
+----+------+
(1 row)
SELECT * FROM race_e;
+----+------+------+
| id | b_id | d_id |
+----+------+------+
| e1 | b1 | d1 |
+----+------+------+
(1 row)
DELETE FROM race_a WHERE id = 'a1';
SELECT * FROM race_a;
+----+
| id |
+----+
+----+
(0 rows)
SELECT * FROM race_e;
+----+------+------+
| id | b_id | d_id |
+----+------+------+
+----+------+------+
(0 rows)
Almost all of the changes will be in fk.go, rowwriter.go and a new file,
cascader.go, and will need to take into consideration the possibility that we
will be adding in deferrable constraints in the future. This does break the
general pragmatic rule of code for what you need today not for what you will
need tomorrow. So the suggestion here is not to alter the design, but to be
cognizant of the implications of deferrable.
One tradeoff to using cascading actions is that they will always be somewhat
slow. This is due to the fact that it is not possible to know beforehand if a
cascade is actually required. An index row fetch must be performed on all
referencing tables to see if the value being updated/deleted exists. For
example, a self-referencing table with an ON DELETE CASCADE will have to
perform a row fetch for each row being deleted; then again for each row that is
cascading the delete; then yet again for each of those new cascading deletes,
etc.
The algorithm to determine which tables and rows are affected by any operation
is a graph walk. Each table can be seen as a node in the graph and each foreign
key relationship as an edge. What complicates this walk is that the different
actions for both ON UPDATE and ON DELETE determine different stopping
conditions. This algorithm will be used twice in practice. The first time during
planning to determine which table descriptors will need to be fetched, and then
again, during execution, when performing the actual operations. The difference
between the two, is that when fetching the table descriptors, one must assume
that any relationships may be affected, while when actually performing the
operations, only a subset of those tables will actually require changes. The
main goal of this algorithm is twofold. First to ensure no table is unaccounted
for that could potentially produce an orphaned reference. And secondly, to not
go too far so such that no table descriptors are fetched which will never be
needed for the operation and thus would have no chance to create orphaned rows.
To describe the algorithm, we're going to concentrate on a single step of the recursion, performing a lookup of exactly a single table with an incoming operation. Let's first assume we have the following setup:
'NA 'DC 'DS 'UC 'US
\ \ | / /
\ \ | / /
\ \ | / /
\\|//
X
//|\\
/ / | \ \
/ / | \ \
/ / | \ \
NA DC DS UC US
Each element in this is a table. Table X is the table in which the operation
will be occurring. The other tables are labeled according to what type of
dependency it has to X, or X has to it. All dependencies flow from the top
down, so table X references a column in table 'NA. And table NA references
a column in table X. All the tables on the first row, the ones that start with
a ' have a column that X is referencing and all tables on the 3rd row that
don't start with the ' are referencing table X.
All the tables besides X are named based on the type of reference they have to
X. Note that in practice, there are two options for each relationship, ON DELETE and ON UPDATE, but for simplicity, we assume that the other operation
is a NO ACTION or RESTRICT. The tables are labeled accordingly:
NA, 'NA - NO ACTION or RESTRICTDC, 'DC - DELETE CASCADEDS, 'DS - DELETE SET NULL or DELETE SET DEFAULTUC, 'UC - UPDATE CASCADEUS, 'US - UPDATE SET NULL or UPDATE SET DEFAULTSo table X has a DELETE CASCADE reference to 'DC and DC has a DELETE CASCADE reference to X. So if a row was deleted in 'DC, X would possibly
delete a row and then DC would possibly delete a row.
There are 3 operations that can occur on a value in a table and we will be looking at each one individually.
Cycle detection is important and required here as the referential integrity actions can form circular dependencies. How this is accomplished will be described in the sections describing where this algorithm is applied.
When inserting a row into table X, all the tables that it references must be
fetched. This can be done without worrying what type of relationship it has and
the walk will not cascade from those tables.
So we require the fetching of tables 'NA, 'DC, 'DS, 'UC and 'US and we
don't required them to be walked any further.
For tables that reference X, inserting will not affect them and their table
descriptors do not need to be fetched. Inserts never cascade.
When deleting a row in table X, the tables that it references do not need to
be fetched, as it won't be able to orphan any columns there.
For the tables that reference X, a delete requires the following:
NA should be fetched and not walked any further.DC should be fetched and walked as a DELETE operation.DS should be fetched and walked as an UPDATE operation.UC should be fetched and not walked any further.US should be fetched and not walked any further.When updating a row in X, similar to the insert case, all the tables that
table X references are required. Also similar to the insert case, there is
also no need for those to be walked any further. So 'NA, 'DC, 'DS, 'UC
and 'US will be fetched and not walked.
For the tables that reference X, an update requires the following:
NA should be fetched and not walked any further.DC should be fetched and not walked any further.DS should be fetched and not walked any further.UC should be fetched and walked as an UPDATE operation.US should be fetched and walked as an UPDATE operation.CascaderA new Cascader struct will be added that will coordinate all cascading
operations. The Cascader will also act as a cache for RowDeleters,
RowUpdaters and RowFetchers. It will contain the following:
RowDeleters - for cascading deletes - there should
only ever be at most one per table accessed during the cascade that requires
deletesRowFetchers - to be used with RowDeleters -
there should be at most one per table accessed during the cascade that
requires deletesRowUpdaters - for all none deleting cascades - there
should only ever be at most one per table accessed during the cascade that
requires updatesRowFetchers - to be used with RowUpdaters -
there should be at most one per table accessed during the cascade that
requires updatesRowFetchers - to be used
when looking for foreign key references that need to be cascaded - there
should be at most one per table per foreign key index accessed during the
cascadeDatums - for deleted row values that
need to be checked to ensure no orphaned rows are created - the map will
contain one array per table, and the array will contain Datums for every row
that was deletedDatums - for updated row values that
need to be checked to ensure no orphaned rows are created - the map will
contain one array per table, and the array will contain Datums for every row
that was updatedThe in memory size of this table is clearly related to three factors:
Even with the caching all of the RowDeleters, RowUpdaters and
RowFetchers, there is still the possibility of running out of memory. To
prevent such an eventuality, the transaction's memory monitor will be used for
all cached values in the cascader. The most likely source of out of memory
issues will be from the list of values that require foreign key checks after the
cascade is completed.
Even with these guardrails in place, it is still important to test for both very broad (e.g. 1,000,000 tables all referencing one) and very long (e.g. 10,000,000 self-referential rows in a giant linked list) setups.
RowDeleter and RowUpdaterBoth the RowDeleter and RowUpdater will be modified to allow for the
addition of a Cascader. The Cascader will only be added on the first deleter
or updater that is created and all the internal deleters and updaters that are
created as part of the cascading operations will not require one. As there is
only one Cascader and it will be included as a pointer, this should have
negligible size impact on the row writers.
TablesNeededForFKsfk.go's TablesNeededForFKs function will have to be expanded. Currently, it
only fetches the table IDs that need to be checked when an update, delete or
insert operation occurs. With the addition of cascading referential constraints,
this is no longer sufficient. Instead of just fetching IDs, this function will
now also support optionally retrieving the table descriptors themselves. Without
the actual table descriptors, it would not be possible to walk the dependency
graph and thus impossible to know if there may be yet another cascading action.
This will be done iteratively and should be relatively quick. Care will be taken
to not slow down non-cascading examples as this codepath is used with all basic
operations. It's important to note that in most operations (i.e. non-backfill
ones), immediately after calling TablesNeededForFKs lookup all the required
table descriptors. The real change here is that more table descriptors will have
to be fetched for each operation.
The only augmentation to the function call itself will be to add a passed in function that will perform the actual table descriptor lookup.
The table walk will be accomplished iteratively using a queue and perform the
dependency walking algorithm described above. As the final output is a map of
table IDs to table descriptors we already have a data structure that can be used
for cycle detection. However, we also have to be conscious of the fact that
there may be a need to iterate over the same table twice, once with an update
and once later with a delete (or vice versa) due to potential dependency
cycles within the graph. This adds a slight complication to the map but can be
accounted for.
Both memory and runtime of this function will be bounded based on the number of tables that may potentially be part of a series of cascade operations.
DeleteRow and UpdateRowBoth of rowwriter.go's DeleteRow and UpdateRow will require a small split
so that there are different call sites for when a cascade is desired or not.
This will have no memory impact but will obviously, have a performance impact
due to the cascading call itself.
Cascader's CascadeAllThis is where the bulk of the new functionality will exist. When an operation,
be it a delete or an update are executed, they will follow the normal path in
their respective row writer. But they will call out to the Cascader.
Internally, this means that the first call to DeleteRow or UpdateRow
functions will perform their original write operation and then call
CascadeAll. However, the Cascader may make additional calls to DeleteRow
or UpdateRow on either the same or on new row writers. These additional new
writes will not call CascadeAll again. This maintains the separation of
concerns such that all foreign key relationship checking will still be done in
fk.go; all row writes will still be performed in rowwriter.go; and the
coordination of the cascading operations will now be done in cascade.go.
Checking for cascading actions requires a number of steps but can be summarized thusly:
RowDeleter/RowUpdater
specifically without creating a new cascader.DeleteRow/UpdateRow for each change (without triggering
another cascade call).RowDeleter and RowUpdater foreign key constraint
checker for all rows that have been updated to ensure no orphans were
created.Of course, there are memory concerns that must be addressed. The innermost loop
may return too many rows and could cause out of memory errors. This can be
alleviated by chunking the results of the first row fetcher and also using the
transaction's memory monitor for all values being returned from that loop. Also,
the memory monitor will be used for the queue in the CascadeAll function. This
will prevent the queue from growing too large, which can be caused by a graph
that is both deep and wide.
It's important to note that there are two main operations that can occur. A
Delete can only be triggered from a DELETE CASCADE action. All other actions
will perform an Update instead.
Cycle detection is done by performing all delete and update operations in a batch and by delaying the checking for foreign key constraints violations until after all operations are complete. If there is a cycle, the first delete or update, which would leave an orphaned row in isolation, is performed blindly, as are all subsequent cascading operations. Once all the operations are complete, all referential integrity constraints that were touched are then checked. This has the disadvantage that a long running cascade operation that ultimately fails will leave around a large number of intents due to the failed transaction. But it does leverage our transaction mechanism itself to store these changes and does not require large amounts of memory.
The runtime of this function is heavily dependent on the number of cascading operations that will be performed. An further still on each of those operations' runtimes. Each non-cascading operation, one that doesn't return rows to cascade to, requires one row fetcher operation. Each operation that does perform a cascading operations requires two row fetchers, the addition time required to performing the actual delete or update and finally the checking for orphaned rows.
These optimizations are optional and will be explored after all basic operations are working.
Since one of our primary use cases will be interleaved tables, special attention should be paid to optimizing cascading deletes and updates. Interleaved child tables that are isolated, as in none of the child tables have any external referential dependencies, and complete, all child tables reference the parent table, can be done as a batch delete and should be quick. Similarly, there may be a possibility for similar batching when upgrading.
There may be some quick optimizations for self-referencing tables. Similar to interleaved tables, it may be possible to perform batch deletes or updates. These will be explored after all basic functionality is working.
To begin with, the pre-fetching of all required table descriptors will be added
to TablesNeededForFKs. After this is complete, each of the new actions will be
implemented in order of importance, which is roughly:
ON DELETE CASCADEON UPDATE CASCADEON DELETE SET NULLON UPDATE SET NULLON DELETE SET DEFAULTON UPDATE SET DEFAULTThe following rationale were used to determine the order of work:
ON DELETE CASCADE is the most common and should help with ORM compatibility
and is the most requested so it is placed at the top of the list.ON DELETE SET NULL ON UPDATE CASCADE is the default setting for Sequelize
see #17803.ON UPDATE CASCADE
should be high on this list.SET DEFAULT adds more failure modes than SET NULL and is less common in
other SQL implementations, so SET NULL will be implemented next.SET DEFAULT is the best candidate for cutting
from the next release.All of the above examples and some more complex ones will be added in a logic test style manner. Furthermore, the dependency walking algorithm, when used for both fetching tables and actually performing operations will be tested in isolation using Go's unit testing framework directly.
It might also be useful to create a new type of test that specifically address testing these new relationships. This would be in addition to logic tests and use a new DSL. This testing is in scope for the implementation of RFC and will be added early on to facilitate better testing of all operations. However, the details of the tests themselves will be determined during the implementation.
It will also be useful and informative to test the limits of our cascading operations and compare us to Postgres. Can we handle 1,000,000 cascades? What if there were in a single table or multiple tables? Do we need to artificially limit the size of the cascades or will the our intent limit do that for us? These tests should focus on these 3 variables:
These tests will be performed using some basic scripts that will be checked in for repeatability.
It will be important to see if the addition of cascading actions affect any of our current update and delete benchmarks.
Furthermore, a few new benchmarks will also be added to monitor normal use cases:
This is a high level, and subject to change, list of the main PRs
ON DELETE CASCADEON UPDATE CASCADEON DELETE SET NULLON UPDATE SET NULLON DELETE SET DEFAULTON UPDATE SET DEFAULTWithout adding in deferrable at this time, there will be some work that must be done when deferrable constraints and transaction settings are added.
SHOW RELATIONS sql command that would output the full graph of relations
for a table would be quite helpful.Not supporting cascade is not really an option, but supporting Postgres' full range of cascading is not a requirement. The most common use cases do not require self-referencing and circular cascading of the constraints. There is no requirement to add deferrable constraints and we may never do so.
So the alternatives here are to follow MySQL or SQL Server's style of dealing with cascading referential integrity constraints.
If we implement MySQL's deferred style, this would greatly simplify the engineering by not having to ever defer constraint enforcement, but instead turn it off temporarily. This seems compelling but would allow the database state to become quite broken. MySQL gets around this by never checking constraints that aren't updated but this seems lazy and may cause strange failures in the long run. If we do choose this alternative, it would be imperative that we also add a command to check the constraints after turning the constraint checking back on. And we would also have to have some type of warning if we were to allow turning off constraint checking outside the scope of a session.
A different approach that would also not include deferring of constraints and would actually scope this project down significantly would be to follow SQL Server's rule that all cascading constraints must form a tree of tables and not a graph. So there can be no loop backs and does not allow self-referential tables. The self-referencing example above would no longer be valid.
However, in lieu of the deferring of constraints, it is commonly suggested that the use of common table expressions (CTE) should be used in its place. And that they are sufficient when dealing with self-referencing or circular dependencies. CTEs are a significantly larger project than adding in deferrable constraints and thus it may be a long time before we are able to support those interesting by still rare cases.
ON DELETE CASCADE and/or ON UPDATE CASCADE.