docs/design/2025-11-05-active-active.md
The key requirement is to enable an “Active-Active Deployment” capability in TiDB. The goal is to build a global database solution that supports multiple TiDB clusters distributed across regions, with the ability to handle conflicting writes.
In this setup, an application can connect to any TiDB cluster to perform both read and write operations on any data. Each transaction is processed locally in the TiDB cluster to which the application connects. TiCDC is then responsible for replicating the data changes to other TiDB clusters.
This design provides eventual consistency, and any write conflicts are resolved using the Last Write Wins (LWW) strategy.
The following example illustrates multiple TiDB clusters forming an active-active deployment, referred to as an “active-active group.”
Since a transaction happens only inside a TiDB cluster, and the commit_tss are generated from local PD node which has no direct connection to other TiDB clusters' PD nodes, the solution does't provide global transactional consistency. A simple example: A client connects to cluster A and sets value to x, then connects to cluster B and sets the same value to y in a very short time (before the update is replicated to B by TiCDC). The system does not guarantee the final result of value is y. It only guarantees the final value of each cluster is the same, no matter x or y, i.e. eventual consistency. To guarantee eventual consistency, the system must address the write conflicts between multiple TiDB clusters. E.g. The same PK rows are modified in more than one cluster simultaneously. The written conflicts are resolved by a strategy named "Last Write Wins (LWW)". An example of how LWW addresses conflict:
This section introduces the detailed design in two major parts: Soft-delete and Active-Active. Soft-delete is a feature enabling the deleted rows to be recovered in a retention duration. Active-Active describes how we guarantee eventual consistency between multiple TiDB clusters with TiCDC.
In TiDB, after a row is deleted by DELETESQL, it is no longer accessible unless by historical read. After the tidb_gc_life_time passed, it is no longer accessible or recoverable by any means because the MVCC versions are cleaned up by GC jobs . But some businesses would like to have a much longer retention duration, e.g. 2 weeks. It is the initial intention of intorducting soft-delete feature.
A row is soft deleted means it is marked tombstoned but not marked deleted yet (i.e. no delete mark in MVCC list). The MVCC GC process will never clean it up until it is officially hard deleted (i.e. append a delete mark in MVCC list) after the soft delete retention expired.
Another requirement for soft delete feature comes from the LWW strategy. During LWW, we need to compare the DELETE operations with INSERT and UPDATE operations by SQLs, as TiCDC replicates changes by pure SQLs. The issue is that after a row is hard deleted, normal SQL cannot retrieve the commit_ts of DELETE operation any more. And soft delete can help to resolve the issue: the row is not actually hard deleted, so we can make SQLs to retrieve the soft deleted row to do commit_ts comparsions. You can find more details in the CDC sections below.
In the design, we use the SQL rewrite approach to implement the soft-delete. A hidden _tidb_softdelete_time column is added to soft-delete tables, and we rewrite the DML SQLs on the soft-delete table.
First, one needs to create a softdelete table. It is either inherited from db options automatically:
CREATE DATABASE t SOFTDELETE RETENTION 7 DAY;
USE t;
CREATE TABLE message (
id int PRIMARY KEY,
text varchar(10)
);
Or specify options directly for CREATE TABLE:
CREATE TABLE message (
id int PRIMARY KEY,
text varchar(10)
) SOFTDELETE RETENTION 7 DAY;
Additionally, one can also specify clean up job intervals:
CREATE TABLE message (
id int PRIMARY KEY,
text varchar(10)
) SOFTDELETE RETENTION 7 DAY SOFTDELETE_JOB_INTERVAL '24h' SOFTDELETE_JOB_ENABLE 'ON';
ALTER TABLE message SOFTDELETE_JOB_INTERVAL '24h';
ALTER TABLE message SOFTDELETE_JOB_ENABLE 'ON';
Then, one needs to enable softdelete synmatics if it is not:
set @@session.tidb_translate_softdelete_sql=on;
set @@global.tidb_translate_softdelete_sql=on; -- it is default on
With tidb_translate_softdelete_sql=on, DELETE statements will be rewroten internally to mark rows as deleted by modifying an internal hidden column. And SELECT statements will filter those rows. INSERT/UPDATE will treat softdeleted rows as deleted. And you can recover softdeleted rows before the expiration by the following new syntax.
RECOVER VALUES FROM message WHERE id = "1001";
Softdeleted rows will be deleted after the expiration, i.e. hard deleted by the cleanup jobs.
Internally, if a table is created as softedelete table, TiDB creates an additional internal hidden column named _tidb_softdelete_time for it, to store the expired timestamp now().
All sqls in session with tidb_translate_softdelete_sql=on will be translated automatically, including business TTL of TiDB. These change are visible in explain ... statement.
For example, SELECT * from message WHERE ... may be translated to SELECT * FROM message WHERE _tidb_softdelete_time is NULL AND (...).
explain SELECT * FROM message WHERE text like '%43%';
+---------------------+----------+-----------+---------------+-------------------------------------+
| id | estRows | task | access object | operator info |
+---------------------+----------+-----------+---------------+-------------------------------------+
| TableReader_7 | 8000.00 | root | | data:Selection_6 |
| └─Selection_6 | 8000.00 | cop[tikv] | | like(test.message.text, "%43%", 92), isnull(test.message._tidb_softdelete_time) |
| └─TableFullScan_5 | 10000.00 | cop[tikv] | table:message | keep order:false, stats:pseudo |
+---------------------+----------+-----------+---------------+-------------------------------------+
For mysql compatibility, the result of SHOW CREATE TABLE uses comment format:
CREATE TABLE message (
id int PRIMARY KEY,
text varchar(10)
) /*T![softdelete] SOFTDELETE RETENTION 7 DAY*/;
And you can disable the cleanup job by:
ALTER TABLE message SOFTDELETE JOB INTERVAL 0;
You can turn off the softdelete status on database level, and the following created tables in the database are not softdelete table automatically. While turning off softdelete status at table level is not supported.
ALTER DATABASE t SOFTDELETE = 'OFF';
ALTER TABLE t SOFTDELETE = 'OFF'; -- Phase 1 not supported
And a new system var tidb_softdelete_job_enable is introduced to control the soft delete cleanup jobs running or not.
Because tombstone rows remain in the cluster for a period of time, they may impact query performance when the system needs to skip over them. For example:
CREATE TABLE t(id int primary, v int) SOFTDELETE RETENTION 7 DAY;
-- insert 10000 rows with id (1 .. 10000)
INSERT INTO t VALUE (1, 1), (2, 2) ...;
-- DELETE some rows
DELETE FROM t WHERE id < 5000;
-- Will be translated to SELECT * FROM t WHERE id > 0 AND _tidb_tombstone_time IS NULL LIMIT 1;
EXPLAIN SELECT * FROM t WHERE id > 0 LIMIT 1;
+---------------------------+---------+-----------+---------------+--------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------+---------+-----------+---------------+--------------------------------------------------+
| Limit_9 | 1.00 | root | | offset:0, count:1 |
| └─TableReader_16 | 1.00 | root | | data:Limit_15 |
| └─Limit_15 | 1.00 | cop[tikv] | | offset:0, count:1 |
| └─Selection_14 | 1.00 | cop[tikv] | | isnull(test.t._tidb_tombstone_time) |
| └─TableRangeScan_13 | 5000.00 | cop[tikv] | table:ta | range:(2,+inf], keep order:false, stats:pseudo |
+---------------------------+---------+-----------+---------------+------------------------------------------------+
The SELECT statement above is rewritten to include the condition _tidb_tombstone_time IS NULL in order to skip tombstone rows. However, even with a LIMIT 1 clause, TiDB still needs to scan and skip all 5,000 tombstone rows.
To improve scan efficiency, you can create an index as follows:
ALTER TABLE t ADD INDEX i1(_tidb_tombstone_time, id);
With this index, the query can use i1 to scan efficiently without reading all the tombstone rows:
EXPLAIN SELECT * FROM t WHERE id > 0 LIMIT 1;
+----------------------------+---------+-----------+---------------------------------------------+----------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------+---------+-----------+---------------------------------------------+----------------------------------------------------------+
| IndexLookUp_20 | 1.00 | root | | limit embedded(offset:0, count:1) |
| ├─Limit_19(Build) | 1.00 | cop[tikv] | | offset:0, count:1 |
| │ └─IndexRangeScan_17 | 1.00 | cop[tikv] | table:t, index:i1(_tidb_tombstone_time, id) | range:(NULL 0,NULL +inf], keep order:false, stats:pseudo |
| └─TableRowIDScan_18(Probe) | 1.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+----------------------------+---------+-----------+---------------------------------------------+----------------------------------------------------------+
Please note that TiDB does not create any indexes automatically—you need to create them manually when appropriate. The reason is that the optimal index depends on your specific business workload.
For example, if you frequently query rows where v > 1000 and want to optimize such queries, you can create an additional index like this:
ALTER TABLE t ADD INDEX i2(_tidb_tombstone_time, v);
Most applications should keep tidb_translate_softdelete_sql set to ON when working with soft-delete tables, allowing TiDB to automatically manage tombstone rows. However, there are certain scenarios where you may want to disable it, such as:
_tidb_tombstone_time to an earlier value to accelerate cleanup of specific rows, or to a later value to preserve them longer.This section explores introducing explicit DML to control hard vs soft deletes. The key idea being that rather than relying on a session variable, we should be explicit with the DML for the most common cases. This is an supplement to the use of @@session.tidb_translate_softdelete_sql to control the soft delete behavior explicitly. It can coexist with that session variable if desired, but a more explicit DML is used to show the express intent of the user.
The current MySQL/TiDB DELETE syntax is:
DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROM tbl_name [[AS] tbl_alias]
[PARTITION (partition_name [, partition_name] ...)]
[WHERE where_condition]
[ORDER BY ...]
[LIMIT row_count]
The proposed change would be to add a HARD key word to the DELETE statement:
DELETE [HARD] [LOW_PRIORITY] [QUICK] [IGNORE] FROM tbl_name [[AS] tbl_alias]
[PARTITION (partition_name [, partition_name] ...)]
[WHERE where_condition]
[ORDER BY ...]
[LIMIT row_count]
This would only have an impact on tables where the soft delete capability is enabled. For all other tables, it would be ignored and treated as a normal delete. For tables where soft delete is enabled, then this statement would perform an actual delete of the data (not a soft delete) and so would not be rewritten as a regular DELETE would be.
tidb_translate_softdelete_sql affects the final executing SQLs, the modification to the value of the var affects Virtual View' result and plan cache.tidb_translate_softdelete_sqltidb_ttl_xxx from here. E.g. tidb_ttl_delete_rate_limit, tidb_ttl_delete_batch_size, tidb_ttl_delete_worker_count, etctidb_ttl_job_enable and tidb_softdelete_job_enable controls TTL and soft delete jobs separately.Prometheus metrics:
softdelete_queries: num of issued softdelete queriessoftdelete_processed_rows: rows that are processed by softdelete, labeled with deleted or not.softdelete_query_duration: duration of issued softdelete queries.ttl_job_status,ttl_phase_time,ttl_insert_rows,ttl_event_count will be shared between TTL and softdelete. Or this can be separated.There wil the following grafana dashboards:
Introduce a new system table tidb_softdelete_table_stats in information_schema. The table definition is as follows:
CREATE TABLE tidb_softdelete_table_stats (
db_name varchar(64),
table_name varchar(64),
partition_name varchar(64),
estimated_total_row_count bigint unsigned,
estimated_softdeleted_row_count bigint unsigned
)
The user can access the row count information from this system table.
Internally, the information would be from optimizer statistics. When the user access this system table, tidb access the corresponding optimize statistics from the cache in tidb memory. If the corresponding statistics are collected by not loaded into memory yet, tidb will try to load it from storage.
If the statistics for a table are not collected, or if it's not a softdelete table, tidb can't display information for this table.
If there are valid statistics for the queried softdelete table, tidb would use the statistics on the _tidb_softdelete_time column to calculate the total_row_count and soft_deleted_row_count. Since _tidb_softdelete_time IS NULL means it's not softdeleted, we can get soft_deleted_row_count by subtracting the NULL count from the total count.
Since we rely on optimizer statistics, there are some limitations:
-- The below statements are not allowed for soft-delete table
DELETE t,t1 FROM t JOIN t1 ON t.id=t1.id WHERE t.id=1; -- not allowed
The main reason is that for DELETE statements involving multiple tables, the translation logic would be much more complicated.
A simpler case is that if we have a DELETE stmt with multiple soft-delete tables, we need to check which of them needs to be updated.For example, we need to rewrite it similar to the logic below:
DELETE t1, t2 FROM t1 JOIN t2 ON t1.a = t2.a;
DELETE t1 FROM t1 JOIN t2 ON t1.a = t2.a;
-- translates to
UPDATE t1 JOIN t2 ON t1.a = t2.a SET t1._tidb_softdelete_time = NOW(), t2._tidb_softdelete_time = NOW();
UPDATE t1 JOIN t2 ON t1.a = t2.a SET t1._tidb_softdelete_time = NOW();
If we have a DELETE stmt with soft-delete tables and non-soft-delete tables, it would be harder to implement because that means we need the executor to handle Update logic and Delete logic at the same time.
_tidb_softdelete_time : (Currently unsupported and not planned.)_tidb_softdelete_timeThis part also applies to SELECT statements in DML statements, like INSERT ... SELECT statements.
According to the soft-delete semantic, the rows with a non-null _tidb_softdelete_time is deleted and should not be output when the user queries this table, so we should explicitly filter these rows internally. Besides, this filtering behavior can be disabled by turning off tidb_translate_softdelete_sql.
If tidb_translate_softdelete_sql is enabled, in PredicatePushDown() for DataSource, we will check if it's a soft-delete-enabled table, if it is, then add an extra _tidb_softdelete_time IS NULL filter to the predicates for this DataSource.
Internally, the new hidden column _tidb_softdelete_time is a normal public column in its nature. This column should not be output for any queries (even when tidb_translate_softdelete_sql is disabled), unless the user explicitly specifies this column in the select fields.
For implementation, the only thing we need to consider is the behavior of the SELECT *. Specifically, we should not contain the _tidb_softdelete_time column in the select fields when we are unfolding the * in the select fields. We don't need restrictions or behavior changes for anything else.
In unfoldWildStar(), we should check the column name (since the user can't create a column with this name, so we don't need any extra flags and checking the name is enough), and skip it if it's _tidb_softdelete_time.
When tidb_translate_softdelete_sql is enabled or disabled, the semantic of the SQL is changed and the execution plan can't be reused between the two cases, so we need special handling in the plan cache module.
If there are soft-delete enabled table in a query, we will include the value of the tidb_translate_softdelete_sql in the plan cache key, so that it will have different entries in the plan cache when tidb_translate_softdelete_sql is enabled or disabled.
The insert SQL (without ON DUPLICATE) should check the old row's tombstone to determine whether to perform an insert or a replace, or return a dupkey error:
EXPLAIN INSERT INTO t(id, a) VALUES(1, 2); -- original SQL
+----------+---------+------+---------------+----------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------+---------+------+---------------+----------------------------------------------------------+
| Insert_1 | N/A | root | | replace-conflict-if:not(isnull((t._tidb_softdelete_time) |
+----------+---------+------+---------------+----------------------------------------------------------+
Notice the replace-conflict-if:not(isnull((t._tidb_softdelete_time) in the explain result. During execution, TiDB attempts to locate duplicate rows and checks whether the existing row’s _tidb_softdelete_time value is not NULL. The behavior can be divided into the following cases:
_tidb_softdelete_time is not NULL, delete those rows and insert the new one (i.e., replace)._tidb_softdelete_time is NULL, return a duplicate key error, or skip updating that row if IGNORE is specified.To achieve the above goal, we should add some new fields for InsertExec
type InsertExec struct {
*InsertValues
+ replaceConflictIf expression.Expression
...
}
The replaceConflictIf field is an expression that indicates whether an old row should be replaced if it meets the condition. By default, it is nil, preserving the original behavior. For soft delete tables, however, it is set to the expression _tidb_softdelete_time is not NULL.
If replaceConflictIf is set, InsertExec performs the following steps:
replaceConflictIf condition.replaceConflictIf, return a conflict error or append a warning if INSERT IGNORE is specified.replaceConflictIf, delete the old row and insert the new one.Similar to normal insert, the upsert case also uses the replaceConflictIf to check whether a row is a tombstone or not. But upsert should handle the update cases if the non-soft-delete conflict happens.
It performs the following steps:
replaceConflictIf condition.replaceConflictIf, update the conflict row as the specified value. Because currently changing a primary key is not allowed and there is no other unique indexes, the update phase should have no conflict.replaceConflictIf, delete the old row and insert the new one.We only need to rewrite the SQL to make sure the update will skip updating the tombstone rows, for example:
-- original SQL
UPDATE t SET v = v + 1 WHERE id = 1;
-- rewrite SQL for soft-delete table
UPDATE t SET v = v + 1 WHERE id = 1 and _tidb_softdelete_time IS NULL;
Currently, updating a soft-delete table will not meet any conflict because:
The behavior of a REPLACE statement on a soft-delete-enabled table will be the same as a normal one, which is:
If tidb_translate_softdelete_sql is enabled, a DELETE statement translates to an UPDATE with the same query part plus SET _tidb_softdelete_time = NOW() internally.
For example,
-- User queries:
DELETE FROM t WHERE col_a = 10;
DELETE FROM t WHERE col_a = 10 ORDER BY col_b LIMIT 1;
-- internally translates to:
UPDATE t SET _tidb_softdelete_time = NOW() WHERE col_a = 10;
UPDATE t SET _tidb_softdelete_time = NOW() WHERE col_a = 10 ORDER BY col_b LIMIT 1;
The privilege check is conducted as if it's a normal DELETE statement, i.e., we check the DELETE privilege instead of the UPDATE privilege.
Introduce a new SQL syntax RECOVER VALUES FROM <table_name> WHERE <expr>
Internally, this SQL translates to UPDATE <table_name> SET _tidb_softdelete_time = NULL WHERE <expr> (without further applying the rewriting for normal UPDATE statements mentioned above). The privilege check and query result will be the same as that.
There is a minimum for the job interval, 1h. A background task will scan all tables with softdelete enabled, and see if SOFTDELETE_JOB_INTERVAL is met since the TiDB startup.
New fields will be added to mysql.tidb_ttl_table_status and mysql.tidb_ttl_tasks to reuse the existing distributed TTL framework. One can tweak the concurrency, schedule time like TTL, using the same set of variables.
The whole process is two-phased: dispatch, scan&delete. Similar to TTL, one instance will dispatch up to 64 tasks per table, and store that information into tidb_ttl_tasks table. And scan&deletion tasks will be distributed to the whole cluster, reporting to tidb_ttl_table_status.
Note that the dispatch manager will either issue softdelete cleanup jobs, or TTL jobs. This is a technical limitation, and may be solved in the future.
When collaborating with active-active synchronization and TiCDC, i.e. with ACTIVE_ACTIVE=ON on the table meta, cleanup worker should also query the minimum checkpoint_ts from ticdcProgresTable. Every hard deletion requires min(current_tso, checkpoint_ts) >= IFNULL(_tidb_origin_ts, _tidb_commit_ts). Otherwise there may be inconsistency.
(Phase 1 not supported, might support in the future phases)
One may want to migrate existing tablesto softdelete for active-active usage or so.
It could be done by:
// enable softdelete for random tables
ALTER TABLE t ENABLE SOFTDELETE SOFTDELETE_RETENTION '7d';
ALTER TABLE ENABLE statements will try to create the hidden columns(or indexes) automatically if it does not exist. And will do nothing if it exists.
And one can also disable it anytime:
// disable it
ALTER TABLE t SOFTDELETE='OFF';
ALTER TABLE t SOFTDELETE='OFF' statements will not delete hidden columns or indexes, but just disable the functionality on this table.
That said, to completely disable softdelete, one needs to manually drop columns and indexes using admin:
ALTER TABLE t DROP COLUMN _tidb_softdelete_time;
You could also amend the table manually to correct any error/inconsistency.
_tidb_softdelete_time. So the query performance on tombstoned rows is guaranteed.In this section we introduce how to build active-active synchronization in a TiDB cluster group. TiDB already supports CDC bi-directional to replicate changes between two TiDB clusters. The main challenge is how to achieve eventual consistency by "Last Write Wins(LWW)" strategy.
The basic idea of this design is that before TiCDC applying a change from upstream TiDB cluster to a target TiDB cluster, TiCDC compares the commit_ts of the change to the latest existing MVCC version's commit_ts and only applies it when the former commit_ts is larger. And since we cannot directly use the commit_ts from upstream TiDB cluster as the target MVCC version's commit_ts(It breaks TiDB's transaction assumption and might cause transactional inconsistency), the design adds a new hidden column tidb_origin_ts to store the original commit_ts for further comparison.
You can use the table option ACTIVE_ACTIVE to create a table that supports active-active synchronization. Note that the SOFTDELETE option is also required when defining an active-active table:
-- Create a table that supports active-active sync
CREATE TABLE message (
id int PRIMARY KEY,
text varchar(10)
) ACTIVE_ACTIVE = "ON" SOFTDELETE RETENTION = 7 DAY;
-- Option ACTIVE_ACTIVE requires SOFTDELETE enable.
-- The below statement will fail because SOFTDELETE is not enabled:
CREATE TABLE message ( -- returns ERROR!
id int PRIMARY KEY,
text varchar(10)
) ACTIVE_ACTIVE = "ON";
The output /*T![active-active] ACTIVE_ACTIVE='ON' */ /*T![soft-delete] SOFTDELETE RETENTION=7 DAY */ of SHOW CREATE DATABASE is in comment format, which is also legal.
Another option is to add the option ACTIVE_ACTIVE to the database. The newly created table will then inherit the ACTIVE_ACTIVE automatically:
-- Create a database with active-active option
CREATE DATABASE test ACTIVE_ACTIVE = 'ON' SOFTDELETE RETENTION = 7 DAY;
SHOW CREATE DATABASE test;
+----------+------------------------------------------------------------------------------------------------------------------+
| Database | Create Database |
+----------+------------------------------------------------------------------------------------------------------------------+
| test | CREATE DATABASE `test` /*T![active-active] ACTIVE_ACTIVE='ON' */ /*T![soft-delete] SOFTDELETE RETENTION=7 DAY */ |
+----------+------------------------------------------------------------------------------------------------------------------+
-- Create a new table under the database test
USE test;
CREATE TABLE message (
id int PRIMARY KEY,
text varchar(10)
);
-- The newly created table should inherit the active-active option automatically
SHOW CREATE TABLE message;
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| message | CREATE TABLE `message` ( |
| | `id` int NOT NULL, |
| | `text` varchar(10) DEFAULT NULL, |
| | PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */ |
| | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T![active-active] ACTIVE_ACTIVE='ON' */ /*T![soft-delete] SOFTDELETE RETENTION=7 DAY */ |
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
You can also use ALTER DATABSE statement to change the active-active option. Please note that it only affects the newly created table under this database.
-- Create a normal database
CREATE DATABASE test;
USE test;
CREATE TABLE message (
id int PRIMARY KEY,
text varchar(10)
);
-- Alter database with active-active option
ALTER DATABASE ACTIVE_ACTIVE='ON' SOFTDELETE RETENTION=7 DAY;
-- The created table before ALTER should not be affected
SHOW CREATE TABLE message;
+---------+-------------------------------------------------------------+
| Table | Create Table |
+---------+-------------------------------------------------------------+
| message | CREATE TABLE `message` ( |
| | `id` int NOT NULL, |
| | `text` varchar(10) DEFAULT NULL, |
| | PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */ |
| | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+---------+-------------------------------------------------------------+
-- Only the newly created table should be affected
CREATE TABLE message2 (
id int PRIMARY KEY,
text varchar(10)
);
SHOW CREATE TABLE message2;
+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| message2 | CREATE TABLE `message2` ( |
| | `id` int NOT NULL, |
| | `text` varchar(10) DEFAULT NULL, |
| | PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */ |
| | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T![active-active] ACTIVE_ACTIVE='ON' */ /*T![soft-delete] SOFTDELETE RETENTION=7 DAY */ |
+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
Only the database level ACTIVE_ACTIVE option is allowed to be altered:
-- Alter database to add or drop ACTIVE_ACTIVE options is allowed
ALTER DATABASE test ACTIVE_ACTIVE='ON' SOFTDELETE RETENTION=7 DAY;
ALTER DATABASE test ACTIVE_ACTIVE='OFF';
-- Alter the table to add or drop ACTIVE_ACTIVE options is not allowed
ALTER TABLE test ACTIVE_ACTIVE='ON' SOFTDELETE RETENTION=7 DAY; -- returns ERROR!
ALTER TABLE test ACTIVE_ACTIVE='OFF'; -- returns ERROR!
_tidb_origin_ts and _tidb_commit_tsIf a table is created with ACTIVE_ACTIVE, a new hidden column _tidb_origin_ts will be added. The column _tidb_origin_ts indicates the original commit ts if this row is synced from the upstream, and it has the following definition:
`_tidb_origin_ts` BIGINT DEFAULT NULL
_tidb_origin_ts is NULL, indicating that the row was last inserted or updated by a local transaction._tidb_origin_ts is not NULL, it means the row was synchronized from an upstream cluster via CDC. In this case, the value of _tidb_origin_ts represents the commit timestamp of the original row in the upstream cluster.We also propose a read-only column, _tidb_commit_ts, to represent the row’s commit timestamp in the local cluster. The _tidb_commit_ts column has the same definition as _tidb_origin_ts:
`_tidb_commit_ts` BIGINT DEFAULT NULL
_tidb_commit_ts shows the row’s commit timestamp in the local cluster._tidb_commit_ts is NULL, it indicates that the row has not yet been committed. It happens only inside TiDB as temporary transaction status.The two columns above are hidden by default, but you can still query them by explicitly specifying their column names:
-- The `_tidb_origin_ts` and `_tidb_commit_ts` is hidden by default.
> SELECT * FROM message;
+----+-------+
| id | text |
+----+-------+
| 1 | hello |
+----+-------+
-- You can query the hidden columns by specifying the column name:
> SELECT id, _tidb_origin_ts, _tidb_commit_ts FROM message;
+----+-------------------+--------------------+
| id | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------------+--------------------+
| 1 | NULL | 460711508282966017 |
+----+-------------------+--------------------+
After introducing the columns _tidb_commit_ts and _tidb_origin_ts, the "actual commit timestamp" of a row can be represented as follows:
IFNULL(`_tidb_origin_ts`, `_tidb_commit_ts`)
-- Suppose an old row is from an upstream cluster
> SELECT id, message, _tidb_origin_ts, _tidb_commit_ts FROM message;
+----+-------+--------------------+--------------------+
| id | text | _tidb_origin_ts | _tidb_commit_ts |
+----+-------+--------------------+--------------------+
| 1 | hello | 461617167984492544 | 461617258860118016 |
+----+-------+--------------------+--------------------+
-- Local transaction will always overwrite the old row.
> UPDATE message SET text = "world" WHERE id = 1;
-- The new row should have:
-- `_tidb_origin_ts` is NULL
-- `_tidb_commit_ts` > old `_tidb_origin_ts`
> SELECT id, message, _tidb_origin_ts, _tidb_commit_ts FROM message;
+----+-------+--------------------+--------------------+
| id | text | _tidb_origin_ts | _tidb_commit_ts |
+----+-------+--------------------+--------------------+
| 1 | world | NULL | 461617262510997504 |
+----+-------+--------------------+--------------------+
-- Set _tidb_origin_ts explictly is allowed
-- DO NOT do it only if you know what you are doing, as it may cause the final result to be inconsistent between clusters!
> UPDATE message SET _tidb_origin_ts = 461620485322702848 WHERE id = 1;
> SELECT id, message, _tidb_origin_ts, _tidb_commit_ts FROM message;
+----+-------+--------------------+--------------------+
| id | text | _tidb_origin_ts | _tidb_commit_ts |
+----+-------+--------------------+--------------------+
| 1 | world | 461620485322702848 | 461617262510997504 |
+----+-------+--------------------+--------------------+
These two rules together ensure the Last Writer Wins (LWW) policy—that is, the write with the greater timestamp always overwrites the older value.
-- Suppose a table with an extra unique index
CREATE TABLE t(id int primary key, u int unique);
-- ClusterA
INSERT INTO t VALUES(1, 10); -- commit_ts: ts1
-- ClusterB
INSERT INTO t VALUE(1, 20); -- commit_ts: ts2
-- ClusterA
INSERT INTO t VALUES(2, 20); -- commit_ts: ts3
-- Suppose ts1 < ts2 < ts3.
-- We have a dilemma when syncing (1, 20) from cluster B to cluster A:
-- A should overwrite (1, 10) because ts2 > ts1, and we should remove the conflict row (2, 20) first
-- But we cannot remove (2, 20) because (2, 20) has a newer version (ts2 < ts3)
_tidb_origin_ts and _tidb_commit_ts can not be modified with DDL. (Currently unsupported and not planned to alter these hidden columns)Active-Active enable status after creation is not supported. (Currently unsupported and not planned.)The values of _tidb_origin_ts and _tidb_commit_ts are both TSO allocated by PD. However, since each cluster has its own PD instance, we must ensure that timestamps generated by different PDs are comparable. To achieve this, each PD instance must allocate globally unique timestamp values.
To support this, two configuration parameters are introduced: tso-max-index and tso-unique-index, which define the number of active-active clusters and their identities.
tso-unique-index identifies each cluster in an active-active setup and must be unique across clusters.tso-max-index specifies the maximum possible value for tso-unique-index.Usually, a TSO value is generated according to the following rule:
TSO = (physical_timestamp << 18) + logical_num
logical_num = tso_unique_index + (tso_max_index * N)
// N >= 0 && tso_unique_index <= tso_max_index && logical_num < (1 << 18)
The TSO generation algorithm can be seen below:
// last_physical_timestamp is the physical _imestamp for the last TSO request
var last_physical_timestamp
// last_logical_num is the logical_num for the last TSO request
var last_logical_num
for each TSO request:
current_physical_timestamp = get_current_physical_timestamp()
if current_physical_timestamp != last_physical_timestamp
// for each new physical timestamp, the logical_num is initialized from tso_unique_index
current_logical_num = tso_unique_index
else
// The logical_num should increase by tso_max_index for each request
current_logical_num += tso_max_index
last_physical_timestamp = current_physical_timestamp
last_logical_num = current_logical_num
response(current_physical_timestamp << 18 + current_logical_num)
Because each PD produces distinct logical_num values, the resulting TSO values are guaranteed to be unique across clusters.
A frequent TSO allocation may cause the logical_num to grow quickly. Generally, it is better to ensure tso-max-index is less than 10 to avoid the logical_num being exhausted before physical timestamp advances.
As mentioned above, a local transaction must ensure that its commit timestamp is greater than the old row’s _tidb_origin_ts. To achieve this, the internal implementation should allow DML operations to set a "minimum commit timestamp" for the current transaction.
A new transaction option, MinCommitTS, is proposed. DML operations can use SetOption to enforce that the commit timestamp is greater than a specified value:
// When found the old row's `_tidb_origin_ts` is not NULL
// while executing a local DML.
txn.SetOption(kv.MinCommitTS, oldRow.TiDBOriginalTS)
If kv.MinCommitTS is set, during transaction commit, if PD returns a TSO that is not greater than the specified value, the transaction will wait briefly and retry fetching a new TSO until the returned timestamp meets the requirement.
In most cases, the clock drift between clusters should be minimal. However, if PD returns a TSO that is significantly smaller than the old row’s original timestamp (for example, by more than 500 ms), the commit should fail to prevent excessively long waiting times. In this case, there must be a serious clock drift between PD instances and the system administrator has to fix the issue before proceeding new updates to databases.
We will add a new changefeed configuration item, enable-active-active, with a default value of false. When enable-active-active is true, CDC will synchronize data downstream in LWW mode.
Users need to ensure that all tables that need to be synchronized in this changefeed are created upstream in ACTIVE_ACTIVE mode. enable-active-active mode only supports tidb downstream now.
For Active-Active scenarios, TiCDC uniformly transforms upstream data changes into SQL statements with LWW conflict resolution capabilities for writing to the downstream TiDB clusters.
All upstream changes, including native INSERT and UPDATE statements, as well as UPDATE statements resulting from the internal Soft Delete mechanism, will be uniformly translated into the INSERT ... ON DUPLICATE KEY UPDATE pattern by TiCDC.
Key LWW Logic: TiCDC embeds the logic to compare _tidb_origin_ts, _tidb_softdelete_time and commit_ts within the statement. An update is only executed if the _tidb_origin_ts (or commit_ts) of the incoming change is greater than or equal to the _tidb_origin_ts or commit_ts of the existing row in the downstream. This strictly enforces the LWW strategy.
Additionally, considering that _tidb_softdelete_time might be directly modified in the statement generated by TiCDC. in LWW mode, we will set @@session.tidb_translate_softdelete_sql=off; when writing data downstream.
Example LWW SQL:
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (....) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF(@cond, VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF(@cond, VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF(@cond, VALUES(_tidb_origin_ts), test._tidb_origin_ts);
For physical DELETE statements generated by the TiDB Soft Delete TTL job after the Soft Delete retention period expires:
To mitigate the Corner Case where high TiCDC latency (e.g., during changefeed stops or abnormal conditions) allows a downstream cluster to perform a premature Hard Delete, leading to data inconsistency, we introduce a Hard Delete Safety Watermark mechanism.
The risk is that Cluster B executes a Hard Delete on a row based on local Soft Delete TTL before it receives a prior UPDATE (not a Soft Delete update) from Cluster A, which occurred before the Soft Delete that B is currently processing. This results in data inconsistency between the two clusters.
Therefore, we must ensure that a row is Hard Deleted only when the synchronization progress for that table in the current cluster, from all upstream cluster, is larger then the CommitTS of when the row was Soft Delete.
ticdcProgressTable)To provide this safety watermark, TiCDC, in Active-Active mode, will write to a special table in the downstream TiDB cluster: ticdcProgressTable.
checkpoint_ts) for every table, in every changefeed, from every upstream cluster.checkpoint_ts is: All data changes up to and including this timestamp have been safely synchronized to the downstream.CREATE TABLE ticdcProgressTable (
changefeed_id VARCHAR(255) NOT NULL COMMENT 'Unique identifier for the changefeed synchronization task',
upstreamID VARCHAR(255) NOT NULL COMMENT 'Unique identifier for the upstream cluster',
database_name VARCHAR(255) NOT NULL COMMENT 'Name of the upstream database',
table_name VARCHAR(255) NOT NULL COMMENT 'Name of the upstream table',
checkpoint_ts BIGINT UNSIGNED NOT NULL COMMENT 'Safe watermark CommitTS indicating the data has been synchronized',
PRIMARY KEY (changefeed_id, upstreamID, database_name, table_name)
) COMMENT='TiCDC synchronization progress table for HardDelete safety check';
ticdcProgressTable to find the minimum checkpoint_ts across all active upstream changefeeds for that table.To avoid placing additional read/write burden on the downstream TiDB cluster, TiCDC will update the ticdcProgressTable in a sparse manner, such as once every 30 minutes for all tables. Given that Soft Delete retention periods are typically measured in days, this sparse update frequency will not significantly impact the timeliness of the TTL cleanup.
Executing DDL in LWW mode must comply with the DDL requirements(ticdc-bidirectional-replication) for bidirectional synchronization.
We have two following options:
When an LWW table needs to be synchronized to external downstreams (such as data warehouses), TiCDC must revert the internal LWW representation back to standard events.
How to Enable Cross-Table Transaction Consistency in LWW Mode
enable-transaction-atomic. Once enabled, all data for that Changefeed will be written downstream at transaction granularity. We only support TiDB Downstream now.pause → update filter → resume sequence to safely modify the tables included in the Changefeed.To support Cross-Table Transactions Consistency, we added a transaction combinator layer between the dispatcher layer and the sink to combine transactions. The transaction combinator receives all data from the dispatcher layer, restores it to transaction granularity, and then writes it to various sink modules to generate the corresponding output format for downstream.
The Transaction Combinator consists of two core components: Event Storage, Transaction Processing.
Event Storage
Transaction Processing
DDL and DML events never share a single transaction. The Transaction Processing module must strictly maintain DDL sequence during transaction reassembly:
The Transaction Sink's restart mechanism ensures data consistency and preserves LWW correctness.
To guarantee Cross-Table Transaction Atomicity, it is mandatory that all tables associated with a single Changefeed are synchronized on the same TiCDC node.
Currently, active-active mode does not support secondary unique indexes until a well-defined mechanism is introduced to handle cross-row conflicts. For example:
-- Suppose a table with an extra unique index
CREATE TABLE t(id int primary key, u int unique);
-- ClusterA
INSERT INTO t VALUES(1, 10); -- commit_ts: ts1
-- ClusterB
INSERT INTO t VALUE(1, 20); -- commit_ts: ts2
-- ClusterA
INSERT INTO t VALUES(2, 20); -- commit_ts: ts3
-- Suppose ts1 < ts2 < ts3.
-- We have a dilemma when syncing (1, 20) from cluster B to cluster A:
-- A should overwrite (1, 10) because ts2 > ts1, and we should remove the conflict row (2, 20) first
-- But we cannot remove (2, 20) because (2, 20) has a newer version (ts2 < ts3)
I believe the resolution strategy should be tailored to the specific business logic. But if a defined behavior is required, we may introduce additional rules as follows:
In the example above, ts2 should be skipped because the column u conflicts with the existing row (2, 20), which has a greater commit timestamp.
Unfortunately, the above rule cannot guarantee eventual consistency across clusters. For example:
create table t(id int primary key, u int unique);
-- clusterA
insert into t values(1, 20); -- commit_ts1
--clusterB
insert into t values(1, 10); -- commit_ts2
--clusterC
insert into t values(2, 10); -- commit_ts3
-- suppose commit_ts1 < commit_ts2 < commit_ts3
-- ClusterA has the apply sequence: (1, 20), (1, 10), (2, 10)
-- (1, 10) will overwrite (1, 20)
-- (2, 10) will overwrite (1, 10)
-- The final result is: (2, 10)
-- ClusterC has the apply sequence: (2, 10), (1, 20), (1, 10)
-- (1, 20) be inserted because no conflict
-- (1, 10) will be skipped because its commit_ts is less than (2, 10)
-- The final result is (2, 10), (1, 20)
This issue can still occur even with only two clusters:
-- clusterA
insert into t values(1, 20); -- commit_ts1
--clusterB
insert into t values(1, 10); -- commit_ts2
replace into t values(2, 10); -- commit_ts3
-- supporse commit_ts1 < commit_ts2 < commit_ts3
-- A apply sequence (A: 1, 20) -> (B: 1, 10) -> (B: 2, 10)
-- The final result is: (2, 10)
-- B apply sequence (B: 1, 10) -> (B: 2, 10) -> (A: 1, 20),
-- The final result is: (1, 20), (2, 10)
In this section, we will explain in detail the behavior of each case in active-to-active mode. We use this table to explain in the following cases for simplify.
Note: All values in the following SQL cases are just for demonstration, instead of the actual values, especially the ts-related values.
CREATE TABLE IF NOT EXISTS test (
id INT NOT NULL PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
);
--Region A
INSERT INTO test (id, first_name) VALUES(1, "Ben");
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Ben | NULL | NULL | 1 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--Region B
INSERT INTO test (id, first_name) VALUES(1, "Alice");
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | NULL | 2 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--TiCDC Sync from Region A to Region B
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "Ben", NULL, 1, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
--Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | NULL | 2 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--TiCDC Sync from Region B to Region A
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "Alice", NULL, 2, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
--Region A
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | 2 | 3 |
+----+-------------+------------+-----------------+------------------+
--pre State
--Region A
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | 2 | 3 |
+----+-------------+------------+-----------------+------------------+
--Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | NULL | 2 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--Region A
UPDATE test SET first_name = "Mary" where id = 1;
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Mary | NULL | NULL | 4 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--Region B
UPDATE test SET last_name = "Smith" where id = 1;
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | Smith | NULL | 5 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--TiCDC Sync from Region A to Region B
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "Mary", NULL, 4, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
--Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | Smith | NULL | 5 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--TiCDC Sync from Region B to Region A
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "Alice", "Smith", 5, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
--Region A
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | Smith | 5 | 6 |
+----+-------------+------------+-----------------+------------------+
--pre State
--Region A
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | 2 | 3 |
| 2 | Alice | NULL | 2 | 3 |
| 3 | Alice | NULL | 2 | 3 |
+----+-------------+------------+-----------------+------------------+
--Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | NULL | 2 |
| 2 | Alice | NULL | NULL | 2 |
| 3 | Alice | NULL | NULL | 2 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--Region A
BEGIN;
UPDATE test SET first_name = "Mary" where id = 1;
UPDATE test SET first_name = "Mary" where id = 2;
COMMIT;
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Mary | NULL | NULL | 4 |
| 2 | Mary | NULL | NULL | 4 |
| 3 | Alice | NULL | 2 | 3 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--Region B
BEGIN;
UPDATE test SET first_name = "John" where id = 2;
UPDATE test SET first_name = "John" where id = 3;
COMMIT;
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | NULL | 2 |
| 2 | John | NULL | NULL | 5 |
| 3 | John | NULL | NULL | 5 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
-- TiCDC Sync from Region A to Region B
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "Mary", NULL, 4, NULL), (2, "Mary", NULL, 4, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
-- Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Mary | NULL | 4 | 6 |
| 2 | John | NULL | NULL | 5 |
| 3 | John | NULL | NULL | 5 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
-- TiCDC Sync from Region B to Region A
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (2, "John", NULL, 5, NULL), (3, "John", NULL, 5, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
-- Region A
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Mary | NULL | NULL | 4 |
| 2 | John | NULL | 5 | 7 |
| 3 | John | NULL | 5 | 7 |
+----+-------------+------------+-----------------+------------------+
-- Region A
INSERT INTO test (id, first_name) VALUES(1, "Mary");
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Mary | NULL | NULL | 4 |
+----+-------------+------------+-----------------+------------------+
UPDATE test SET first_name = "John" where id = 1;
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | John | NULL | NULL | 5 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
-- TiCDC Sync from Region A to Region B
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "Mary", NULL, 4, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "John", NULL, 5, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
-- Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | John | NULL | 5 | 7 |
+----+-------------+------------+-----------------+------------------+
This case is just about TiDB transactions. TiDB behavior defined by the transaction mode:
This will never happen in our design; TiDB will guarantee the Region A to generate T1 that always has a greater value than T2.
--pre State
--Region A
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | 2 | 3 |
+----+-------------+------------+-----------------+------------------+
--Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | NULL | 2 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--Region A
DELETE FROM test where id = 1;
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts, _tidb_softdelete_time from test;
+----+-------------+------------+-----------------+------------------+ ---------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts | _tidb_softdelete_time
+----+-------------+------------+-----------------+------------------+ ---------------------+
| 1 | Mary | NULL | NULL | 4 | 4 |
+----+-------------+------------+-----------------+------------------+---------------------+
------------------------------------------------------------------------------
--Region B
UPDATE test SET first_name = "John", last_name = "Smith" where id = 1;
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts, _tidb_softdelete_time from test;
+----+-------------+------------+-----------------+------------------+ ---------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts | _tidb_softdelete_time
+----+-------------+------------+-----------------+------------------+ ---------------------+
| 1 | John | Smith | NULL | 5 | NULL |
+----+-------------+------------+-----------------+------------------+----------------------+
------------------------------------------------------------------------------
--TiCDC Sync from Region A to Region B
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "Mary", NULL, 4, 4) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
--Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+ ---------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts | _tidb_softdelete_time
+----+-------------+------------+-----------------+------------------+ ---------------------+
| 1 | John | Smith | NULL | 5 | NULL |
+----+-------------+------------+-----------------+------------------+----------------------+
------------------------------------------------------------------------------
--TiCDC Sync from Region B to Region A
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "John", NULL, 5, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
--Region A
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+ ---------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts | _tidb_softdelete_time
+----+-------------+------------+-----------------+------------------+ ---------------------+
| 1 | John | Smith | 5 | 6 | NULL |
+----+-------------+------------+-----------------+------------------+----------------------+
--pre State
--Region A
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | 2 | 3 |
+----+-------------+------------+-----------------+------------------+
--Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts |
+----+-------------+------------+-----------------+------------------+
| 1 | Alice | NULL | NULL | 2 |
+----+-------------+------------+-----------------+------------------+
------------------------------------------------------------------------------
--Region A
UPDATE test SET first_name = "John", last_name = "Smith" where id = 1;
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts, _tidb_softdelete_time from test;
+----+-------------+------------+-----------------+------------------+ ---------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts | _tidb_softdelete_time
+----+-------------+------------+-----------------+------------------+ ---------------------+
| 1 | John | Smith | NULL | 4 | NULL |
+----+-------------+------------+-----------------+------------------+----------------------+
------------------------------------------------------------------------------
--Region B
DELETE FROM test where id = 1;
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts, _tidb_softdelete_time from test;
+----+-------------+------------+-----------------+------------------+ ---------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts | _tidb_softdelete_time
+----+-------------+------------+-----------------+------------------+ ---------------------+
| 1 | Mary | NULL | NULL | 5 | 5 |
+----+-------------+------------+-----------------+------------------+---------------------+
------------------------------------------------------------------------------
--TiCDC Sync from Region A to Region B
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "John", "Smith", 4, NULL) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
--Region B
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+ ---------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts | _tidb_softdelete_time
+----+-------------+------------+-----------------+------------------+ ---------------------+
| 1 | Mary | NULL | NULL | 5 | 5 |
+----+-------------+------------+-----------------+------------------+---------------------+
------------------------------------------------------------------------------
--TiCDC Sync from Region B to Region A
INSERT INTO test (id, first_name, last_name, _tidb_origin_ts, _tidb_softdelete_time) VALUES (1, "Mary", NULL, 5, 5) ON DUPLICATE KEY
UPDATE first_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(first_name), test.first_name),
last_name = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(last_name), test.last_name),
_tidb_softdelete_time = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_softdelete_time), test._tidb_softdelete_time),
_tidb_origin_ts = IF((@cond := (IFNULL(test._tidb_origin_ts, test._tidb_commit_ts)<= VALUES(_tidb_origin_ts))), VALUES(_tidb_origin_ts), test._tidb_origin_ts);
--Region A
select id, first_name, last_name, _tidb_origin_ts, _tidb_commit_ts from test;
+----+-------------+------------+-----------------+------------------+ ---------------------+
| id | first_name | last_name | _tidb_origin_ts | _tidb_commit_ts | _tidb_softdelete_time
+----+-------------+------------+-----------------+------------------+ ---------------------+
| 1 | Mary | NULL | 5 | 6 | 5 |
+----+-------------+------------+-----------------+------------------+---------------------+
The data type has no impact to the final result in Active-Active scenario.
This section describes how to build an active-active group with multiple TiDB clusters based on the ability of phase 1. Since it only supports creating active-active tables when they are created at phase 1, a user don't have to do data migration between TiDB clusters.
tso-unique-index and tso-max-index for each of them.active-active enabled in all clusters.enable-active-active option turned on.This section describes how to switch some tables that are not in active-active mode to active-active mode. You can refer to the following steps.
Note: It's not supported in Phase 1, and it might be supported in the future phases.
--startTs as the TSO that start backup(changefeed-cli-parameters).-- It's not supported in Phase 1
ALTER TABLE <> ACTIVE_ACTIVE='ON' SOFTDELETE RETENTION=7 DAY;
This section describes how to switch some tables that are in active-active mode to non-active-active mode. You can refer to the following steps.
Note: It's not supported in Phase 1, and it might be supported in the future phases.
-- It's not supported in Phase 1
ALTER TABLE <> ACTIVE_ACTIVE='OFF';
After an active-active table is altered to a normal table, the hidden column _tidb_origin_ts still exists. If the customer think _tidb_origin_ts is not needed anymore, he can drop this column by ALTER TABLE xxx DROP COLUMN _tidb_origin_ts.
Notice we still have the limitations for DDL operations on _tidb_origin_ts
active-active one._tidb_origin_ts cannot be created manually without the option ACTIVE_ACTIVE. If a non-active-active table has this column, it means the table is altered from an active-active table before. If you alter this table to active-active mode again, this column will be reused with all previous origin ts values.The active-active mode does not provide serializable, linearizability, or causal consistency across clusters. For example, it can lead to a “lost update” scenario when two clusters concurrently update the same row:
-- A counter table
CREATE TABLE counter (id INT PRIMARY KEY, count INT);
-- Suppose both clusters initially contain the same row:
-- (1, 10)
-- Cluster 1 increments the counter
UPDATE counter SET count = count + 1 WHERE id = 1; -- Result: (1, 11)
-- Cluster 2 increments the counter
UPDATE counter SET count = count + 100 WHERE id = 1; -- Result: (1, 110)
-- The final count value will be either 11 if Cluster 1’s commit has the greater timestamp,
-- or 110 if Cluster 2’s commit has the greater timestamp.
-- However, the correct result required by the business should be 111 — meaning one of the updates is lost.
You should not use active-active replication in financial scenarios such as account balance management. The following example illustrates why:
-- The balance table
CREATE TABLE balance (
user_id INT PRIMARY KEY,
balance INT
);
-- Suppose user1 and user2 initially both have 100 dollars.
-- Both clusters initially contain the same rows:
-- (1, 100), (2, 100)
-- In cluster1, user1 transfers 50 to user2
BEGIN;
UPDATE balance SET balance = balance - 50 WHERE user_id = 1;
UPDATE balance SET balance = balance + 50 WHERE user_id = 2;
COMMIT;
-- In cluster2, user2 transfers 49 to user1
BEGIN;
UPDATE balance SET balance = balance + 49 WHERE user_id = 1;
UPDATE balance SET balance = balance - 49 WHERE user_id = 2;
COMMIT;
-- The expected result when executed serially should be (1, 99), (2, 101)
-- But the possible results under Active-Active with LWW are either
-- (1, 50), (2, 150) or (1, 149), (2, 51)