docs/design/2022-08-02-stats-lru-cache.md
This proposes a design of how to maintain stats cache by lru according to memory usage
Previously, tidb maintained the all the indices' stats and some columns' stats which is needed during query. As the maintained stats grows, the total memory usage of the stats will increase and makes tidb server OOM. Thus we use lru to maintain stats cache in order to keep memory safe.
We will provide a Stats Cache Interface which is implemented by LRU Cache and Map Cache. If the tidb server didn't enable Stats LRU Cache, it will use Map Cache by default. Also, we will provide config and global session variable to control whether enable Stats LRU Cache and the capacity of it.
// statsCacheInner is the interface to manage the statsCache, it can be implemented by map, lru cache or other structures.
type statsCacheInner interface {
Get(int64) (*statistics.Table, bool)
Put(int64, *statistics.Table)
Del(int64)
Cost() int64
Len() int
SetCapacity(int64)
}
For column or index stats, we maintained following data structure for stats:
HistogramTopNCMSketchAnd we will also maintain status for each column and index stats in order to indicate its stats loading status like following:
AllLoadedOnlyCMSEvcitedOnlyHistRemainedAllEvictedWhen the column or index stats load all data structures into memory, the status will be AllLoaded.
When the Stats LRU Cache memory usage exceeds the quota, the LRU Cache will select one column or index stats to evict the data structures by following rules to reduce the memory usage:
AllLoaded, it will discard the CMSketch and the status will be changed into OnlyCMSEvcitedOnlyCMSEvcited, it will discard the TopN and the status will be changed into OnlyHistRemainedOnlyHistRemained, it will discard the Histogram and the status will be changed into AllEvictedPreviously tidb server has Sync Stats and asynchronously Loading Histograms in order to load column stats into memory during query. As Stats LRU Cache Policy may evict index stats in memory, we also need Sync Stats and asynchronously Loading Histograms to support loading index stats according to its loading status to keep compatible. During the query optimization, tidb server will collect the used columns and indices, if their stats are not fully loaded, tidb-server will try to load their stats back into the memory.