docs/design/2023-04-11-pause-user-ddl-when-upgrading.md
This document describes a feature that allows users to pause the execution of their Data Definition Language (DDL) statements during the upgrade process of TiDB. During the upgrade process, the system stops the execution of user DDL statements, completes the system DDL operation, and finally resumes the execution of the original user DDL statements. This feature is available for upgrades from v7.1 and later versions of TiDB, and it does not support upgrades from versions prior to v7.1.
Since the cluster upgrade may need to deal with DDL statements(there may be some DDL statements being processed before the upgrade), and DDL itself implementation framework will also need to adjust, resulting in some versions may not be able to roll the upgrade situation. Although above scenarios may only exist in some upgrade cases, it is not easy to describe them one by one. Therefore, when upgrading TiDB clusters, users need to first confirm that there are no DDL statements being executed, otherwise there may be problems with unexpection behavior.
Here are some roles for TiDB in the cluster (assuming upgrading from v7.1 to v7.2 here):
Specific upgrade process:
/tidb/server/global_state , use this path later) to the current TiDB version. In addition, in order to prevent multiple TiDB nodes from upgrading at the same time, it will check whether the value on the /tidb/server/global_state path is empty./tidb/server/global_state value and whether it needs to enter upgrading mode./tidb/server/global_state , and when it receives an upgrade status notification, it will enter upgrading mode./tidb/server/global_state notification
/tidb/server/global_state and the owner value to normal state./tidb/server/global_state , receive this notification, and enter normal mode.In the plan, we distinguish between DDL operation types. Among them, we distinguish whether the DDL operation is a system DDL operation or a user DDL operation by whether the DDL operation is performed on the system table. Therefore, we require users to not perform DDL operations on system tables when upgrading.
With regard to risk 2 mentioned above, the two main issues that we need to deal with later:
The following schemes are mainly different in the way of notifying the upgrading status to the cluster and replying that the cluster status is normal. The specific schemes are as follows:
/tidb/server/global_state to null.Difference from Scheme 1: Whether you need to notify all TiDBs to enter upgrading mode.
/tidb/server/global_state , receives the notification, and withdraws from the owner campaign by setting the tidb_enable_ddl method.