docs/RFCS/20171025_scrub_sql_consistency_check_command.md
This RFC outlines the interface for a set of SQL statements that will check the validity of the table data, including schemas, foreign keys, indexes, and encoding.
These checks will be accessible through SQL queries. If a user wants to run them on a regular schedule, they can use an external process to do so. The checks may possibly be used in logictests to further check correctness.
Until this point, in CockroachDB, there weren't ways to reliably check the consistency of table data. Often, errors are only detected on problems that occur on access. There have been various consistency errors that have popped up during development, appearing as mismatched index entries (#18705, #18533) and unchecked foreign keys (#17626, #17690). A significant factor causing a rise in these types of errors are new features that manipulate tables outside of transactions (schema changes, backup, CSV import).
In summary, aspects that can be checked include:
CHECK, NOT NULL, UNIQUE, FOREIGN KEY)DECIMAL and
collated strings typesSchemas also can be checked for the following:
What follows is a detailed user-level documentation of the feature.
The SCRUB command is for checking the validity of different aspects of
CockroachDB. Optionally, SCRUB will also repair any errors if
possible. In order to run SCRUB, the user needs to have root
permissions.
When a SCRUB is run, a job is made that will be visible through the
Admin UI and the jobs table (SHOW JOBS). The SCRUB job is
cancelable.
The SCRUB statements are as follows:
| STATEMENT | Purpose |
|---|---|
| SCRUB TABLE | Run either all checks or the provided checks on a specific table. |
| SCRUB DATABASE | Run either all checks or the provided checks on a specific database. |
| SCRUB ALL | Run either all checks or the provided checks on all databases. |
If 1 or more rows are returned when the SCRUB command completes the
error found in the data. Each row output refers to a distinct error.
When running SCRUB, the UUID can be found the event log along with any
rows returned. The UUID associated with an execution of SCRUB can be
used to look up failures in the system table system.scrub_errors.
Additionally, any errors found will be reported to Cockroach Labs through Sentry but the with details stripped to not expose any of the data.
Both the rows returned and the system table have the following schema:
| Column | Type | Description |
|---|---|---|
| job_uuid | UUID | A UUID for the execution of a scrub. The job_uuid can be used to later lookup failures from the scrub errors system table. |
| error_type | STRING | Type of error. See the "Error types" section for the possible types. |
| database | STRING | Database containing the error. |
| table | STRING | Table containing the error. |
| primary_key | STRING | Primary key of the row with the error, cast to a string. If no primary key is involved, or retrievable, this will be NULL. |
| timestamp | TIMESTAMP | The timestamp associated with the data that has the error. If present, this will be the last time the data was updated. |
| repaired | BOOL | Whether or not the row has been automatically repaired. |
| details | STRING | A free-form JSON string with additional details about the error. This also includes internal details, such as the raw key, raw secondary key, and table ID. |
The SCRUB TABLE, SCRUB DATABASE, and SCRUB ALL statements run a
scrub on the target (either all databases or the specified
table(s)/databases(s)) to detect any errors in the data. Options can
also be provided for checking only specific aspects. By default, all
checks are run.
The syntax for the statements is as follows:
SCRUB TABLE <table>... [AS OF SYSTEM TIME <expr>] [WITH OPTIONS <option>...]
SCRUB DATABASE <database>... [AS OF SYSTEM TIME <expr>] [WITH OPTIONS <option>...]
Note that the options INDEX and CONSTRAINT are not permitted with
SCRUB ALL as the specified indexes or constraints may be ambiguous.
| Option | Meaning | Options |
|---|---|---|
| INDEX | Scan through all secondary index entries, checking if there are any missing or incorrect. | Either INDEX ALL can be used to check all indexes, or checks can be restricted only to specified indexes, e.g. INDEX (<index_name>...) |
| CONSTRAINT | Force constraint validation checks on all SQL constraints in the table(s) or database(s). | Either CONSTRAINT ALL can be used to check all Constraints, or the check can be restricted only to specified constraints, e.g. CONSTRAINT (<constraint_name>...) |
| PHYSICAL | Scan all the rows to make sure that the data encoding and organization of data is correct. | N/A |
| SYSTEM | Checks all of the system tables associated with the table(s) or database(s). | N/A |
To run all checks on a table, use the following statement. Note that
job_uuid has been truncated for brevity.
SCRUB TABLE mytable
| job_uuid | error_type | database | table | constraint | columns | pkey_id | timestamp | repaired | details |
|---|---|---|---|---|---|---|---|---|---|
| '63616665...' | 'invalid_encoding' | 'mydatabase' | 'mytable' | NULL | {'acolumn'} | 123 | '2016-03-26 10:10:10' | FALSE | '{"key": }' |
| '63616665...' | 'constraint' | 'mydatabase' | 'mytable' | 'acol_not_null' | {'acolumn'} | 1 | '2016-03-26 10:10:10' | FALSE | '{}' |
The following command also outputs the same rows as found above, and can be done anytime after the check has returned. Note that this will return additional columns.
SELECT * FROM system.check_errors WHERE job_uuid = '63616665...';
To run a physical check and an index check on only the indexes
name_idx and other_idx:
SCRUB DATABASE current_db WITH OPTIONS PHYSICAL, INDEX (name_idx, other_idx)
To run all checks on a database with data as of a system time:
SCRUB DATABASE current_db AS OF SYSTEM TIME '2017-11-13'
Scrub also supports the AS OF SYSTEM TIME <expr> clause in order to
run all checks on historical values. For example, this is useful for
checking indexes backfilled by schema changes if schema changes do
backfill historical values in order to verify the integrity (they
currently do not backfill historical values).
The following table has all of the possible error_types found during
checking. It is sorted by the corresponding check options that may
return the error.
| Failure type | Corresponding checks | Description |
|---|---|---|
| constraint | CONSTRAINT | A row was violating an SQL constraint. |
| missing_secondary_index | INDEX | A secondary index entry was missing for a row. |
| dangling_secondary_index | INDEX | A secondary index entry was found where the primary row data referenced is missing. |
| invalid_composite_encoding | PHYSICAL | Data was found with an invalid composite encoding. |
| invalid_encoding | PHYSICAL | Data was found with an invalid encoding, and could not be decoded. |
| noncanonical_encoding | PHYSICAL | Data was found with a non-canonical encoding. |
| internal | SYSTEM | Something happened to a system table... Oops. (TODO: No internal check errors have been considered yet) |
Errors with a failure type constraint will include the keys
constraint and columns in the details JSON column. columns will
list all of the columns involved in the constraint. If the SQL
constraint being checked is a CHECK constraint, then the list of
columns will be empty.
Errors with the failure type MISSING_SECONDARY_INDEX or
DANGLING_SECONDARY_INDEX check will include the keys index and
columns in the details JSON column. index will be the name of the
index and columns will list all of the columns involved in the index.
There are limitless names you could give the check command. Scrub was derived from its use in memory and filesystems. Considering the same statement will be used for both checking and repair, the name is a verb that reasonably fits both.
Fun fact: ZFS' use of scrubbing is distinct from the traditional fsck, as it doesn't require unmounting a disk and taking it offline to do error correction. You could interpret that the distinction was intentional to change the perception of requiring downtime for error correction. Source
An alternative suggestion for concise verbs to express the statement was
CHECK ... for running checks, and CHECK AND CORRECT ... for running
repairs alongside the checks.
Use of an acronym such as DBCC has value -- if people are coming
from MSSQL, they'll recognize the command and its capabilities. To the
unsuspecting person though, it is less clear what the command may do.
Below is list of how other databases provide interfaces for checking the consistency of data.
Microsoft SQL Server has a command DBCC. Documentation can be found
here. They have dozens of sub-commands which they put into 4
categories.
| Command category | What it performs |
|---|---|
| Maintenance | Maintenance tasks on a database, index, or filegroup. |
| Miscellaneous | Miscellaneous tasks such as enabling trace flags or removing a DLL from memory. |
| Informational | Tasks that gather and display various types of information. |
| Validation | Validation operations on a database, table, index, catalog, filegroup, or allocation of database pages. |
One of the closest commands to ours, DBCC CHECKTABLE is under
"Validation". It has a few options you can provide such as
PHYSICAL_ONLY, which checks "physical structure of the page, record
headers and the physical structure of B-trees.", DATA_PURITY which
"check the table for column values that are not valid or out-of-range.
For example ... detects columns with date and time values that are
larger than or less than the acceptable range."
They also have a DBCC CHECKDB which does the same and takes the same
options on a database level.
The last relevant one for us is DBCC CHECKCONSTRAINTS which can take a
constraint name or ALL_CONSTRAINTS. This is more similar to our ALTER TABLE ... VALIDATE then the check command, as the purpose is to check a
disabled constraint.
Oracle has a VALIDATE command that is used for several different
things, such as validating the database or a backup. Documentation can
be found here. This command specifically runs to check
"physical corruption only". In summary, the command also has an option
to check for logical corruption. In both cases, failures are logged and
can be viewed with a LIST FAILURE command.
Use the VALIDATE command to check for corrupt blocks and missing files, or to determine whether a backup set can be restored.
If VALIDATE detects a problem during validation, then RMAN displays it and triggers execution of a failure assessment. If a failure is detected, then RMAN logs it into the Automated Diagnostic Repository. You can use LIST FAILURE to view the failures.
In particular, there is an option which instead makes the database to
check logical corruption, CHECK LOGICAL:
Tests data and index blocks in the files that pass physical corruption checks for logical corruption, for example, corruption of a row piece or index entry. If RMAN finds logical corruption, then it logs the block in the alert log and server session trace file.
MySQL has an operation CHECK TABLE that is very similar to ours
functionally. Documentation can be found here. They also
have ANALYZE, REPAIR, and OPTIMIZE queries. (REPAIR seems
to not do much except for with specific setups.)
CHECK TABLE tbl_name [, tbl_name] ... [option] ...
option = {
FOR UPGRADE
| QUICK
| FAST
| MEDIUM
| EXTENDED
| CHANGED
}
Where any of the options can be used together. Yes, this is actually
very confusing because they each have their own distinct meaning, so
you can CHECK TABLE ... QUICK FAST.
| Option | Meaning |
|---|---|
| QUICK | Do not scan the rows to check for incorrect links. |
| FAST | Check only tables that have not been closed properly. |
| CHANGED | Check only tables that have been changed since the last check or that have |
| MEDIUM | Scan rows to verify that deleted links are valid. This also calculates a key checksum for the rows and verifies this with a calculated checksum for the keys. |
| EXTENDED | Do a full key lookup for all keys for each row. This ensures that the table is 100% consistent, but takes a long time. |
These are ideas that are not the primary focus for the RFC and are explicitly deemed out of scope, but are future additions that have been discussed or would be good improvements for the checks.
system.scrub_errors tableAs with system.jobs, we will need to clean up the scrub_errors
table. This is currently an unresolved question/future work also in
the system jobs RFC.
Outside of checking the user data is consistent, largely testing the correctness of the SQL layer, system tables can also be checked. A few system table checks include:
system.zones has a default entry, i.e. an entry for
RootNamespaceIdsystem.zones entry corresponds to a descriptor and
namespace. (We should really just put a foreign key on the table, but
that might be hard.)system.lease entries refer to valid descriptor and
namespace.system.usersIf checks are run automatically in any manner this brings up problems for how we prevent these checks from impacting foreground traffic. The checks can be run manually at the discretion of a DBA as desired, while fully understanding the impact.
If run automatically the checks could be run as a job so that it will appear in the admin UI and can also be canceled/paused/resumed from there.
To my current knowledge, a simple goroutine run with a timer to initiate
the job will suffice. Alternatively, it was suggested to create a CRON
for CockroachDB in order to manage scheduled jobs, which is deserving of
its own RFC.
Reporting goes in tandem with scheduled checks as it only becomes relevant when we don't have a foreground process to receive errors at. If we add a scheduled job to do checks or do them through other background processes we need a way to alert the user about failures.
For all approaches, whether the command runs in the foreground or starts a background process, we can:
In addition, a few things we may want to consider:
See the Repairing SQL Errors RFC