docs/design/2022-09-22-global-memory-control.md
This proposes a design of how to control global memory of TiDB instance.
Currently, TiDB has a query-level memory control strategy mem-quota-query, which triggers Cancel when the memory usage of a single SQL exceeds mem-quota-query. However, there is currently no global memory control strategy.
When TiDB has multiple SQLs whose memory usage does not exceed mem-quota-query or memory tracking inaccurate, it will lead to high memory usage or even OOM.
Therefore, we need an observer to check whether the memory usage of the current system is normal. When there are some problems, try to control TiDB's memory no longer continue to grow, to reduce the risk of process crashes.
tidb_server_memory_limit.New system variables:
tidb_server_memory_limit: TiDB maintains the overall memory usage within tidb_server_memory_limittidb_server_memory_gc_trigger: When TiDB memory usage reaches a certain percentage of tidb_server_memory_limit, try to take the initiative to trigger golang GC to release memorytidb_server_memory_limit_sess_min_size: The minimum memory of a session that can be killed by TiDBWe need to implement the following three functions to control the memory usage of TiDB:
HeapInuse is larger than tidb_server_memory_limit.runtime.GC(), when HeapInuse is large than tidb_server_memory_limit*tidb_server_memory_limit_gc_trigger.New variables:
MemUsageTop1Tracker atomic.Pointer[Tracker]: Indicates the Tracker with the largest memory usage.NeedKill atomic.Bool in the structure Tracker: Indicates whether the SQL for the current Tracker needs to be Killed.SessionID int64 in Structure Tracker: Indicates the Session ID corresponding to the current Tracker.Implements:
When Tracker.Consume() calling, check the following logic. If all are satisfied, update the MemUsageTop1Tracker.
NeedKill is false, to avoid cancel the current SQL twicetidb_server_memory_limit_sess_min_size(default 128MB, can be dynamically adjusted), can be candidate of the MemUsageTop1TrackerMemUsageTop1TrackerReadMemStat interface in a 100 ms cycle. (Get the memory usage of the current TiDB instance)heapInuse of the current instance is greater than tidb_server_memory_limit, set MemUsageTop1Tracker's NeedKill flag. (Sends a Kill signal)Tracker.Consume(), check its own NeedKill flag. If it is true, trigger Panic and exit. (terminates the execution of SQL)SessionID from the tracker and continuously query its status, waiting for it to complete exited. When SQL successfully exited, explicitly trigger Golang GC to release memory. (Wait for SQL exited completely and release memory)The inspiration for this design comes from uber-go-gc-tuner:
SetMemoryLimit feature to set the soft limit to tidb_server_memory_limit * tidb_server_memory_limit_gc_trigger to ensure that GC can be triggered when reaching the certain threshold.heapInUse being larger than the soft limit can be avoided.Introduce performance_schema.memory_usage and performance_schema.memory_usage_ops_history to display the current system memory usage and historical operations.
This can be implemented by maintaining a set of global data, and reading and outputting directly from the global data when querying.