docs/design/2018-10-08-online-DDL.md
This article will describe the architecture of the Online DDL implemented in TiDB.
Many databases lock tables when performing DDL operations. During this time, many transactions involved in this table are blocked (some databases support reading operations, but at the expense of consuming a large amount of memory). The larger the table is, the longer it affects the transactions.
To simplify the design, the entire system allows only one node to make schema changes at the same time.
To dive deep into DDL processing, you need to understand the overall architecture of the DDL module. This module is operated on the TiDB Server, but it also involves the use of two other components of the TiDB cluster (PD Server and TiKV Server).
DDL jobs have multiple state changes during processing. All TiDB servers in a cluster have at most two states at the same time. Then the cluster is in this scenario when the owner just modified the state to TiKV. So before entering the next state change, we should make sure that all currently available TiDB servers are synchronized to the current state. So you can simply think that each TiDB server needs two modules to ensure that this condition is met.
<font size=1 face="黑体"><center>Figure 1: Structure flow chart</center></font>
The TiDB server which starts first establishes the latest version of information path in PD, which contains the latest schema version number. All TiDB servers need to do the following when starting up:
Each DDL job requires multiple state changes, such as create table: none -> public, add index: none -> delete only -> write only -> write reorg -> public. The schema version is modified after each state change, and each schema version needs to wait for other TiDB servers to synchronize the schema. The process of this process on the owner is shown in the opt section of Figure 2. The main logic of this part can be divided into two parts:
There are many operations in DDL that need to delete data, such as drop table requests. The table may have hundreds of millions of rows, at this time, this operation needs to be processed for a long time. Then we need to do some optimizations for this type of DDL operations. The current optimizations in TiDB include Drop Index, Drop Table, Drop Database, and Truncate Table. In implementation, these operations will have one more logic than other DDL operations. That is, after a normal schema change, we store the information we need to delete the data into the gc_delete_range table, and then process them until the GC stage. This logic can be divided into two parts:
Currently, only the add index operation in TiDB takes a long time to execute, so an add index operation may block other DDL operations. In order to solve this problem preliminarily and steadily, we have implemented the parallel operation between the general operation and the add index operation between the tables. That is, between different tables, the general operation and the add index operation can be performed in parallel.
<font size=1 face="黑体"><center>Figure 2: Owner detailed flow chart</center></font>
In order to speed up this operation, we actually did not populate this column data. This operation does not care about the number of data rows in the table, and only needs to change 5 states throughout the process. The specific implementation is as follows: save the default value of the newly added column to a field named original default value; in the following reading operation, we read the data in this column, and if TiKV reads that the value of this column is null and the value in the field is not empty, then the value in this field is filled to the response and returns from TiKV.
In order to reduce the execution time of this operation and facilitate the operation, only the modification between integer types is supported at present, and only the length of the original type can be lengthened. In addition, there are restrictions on modifying the type definition and other aspects. For details, please refer to the compatibility section. The implementation is to directly change the column field type. If there is an index, the corresponding index field type should also be modified, but the original data does not need to be modified. This method only supports the same type conversion of the codec format because the format of the codec is different.
This operation requires writing the corresponding column data in the table to the new index records. This function is actually divided into two parts, the first part is to read data from the corresponding column. The second part is to backfill the data into the index. In order to speed up this operation, we handle this operation in batches concurrently. The simple steps are as follows:
To reduce the impact on other read and write operations, its default priority is low.
About specific compatibility, you can view MySQL’s compatibility.