docs/en/using_starrocks/caching/block_cache.md
import CacheStats from '../../_assets/commonMarkdown/_cache_stats.mdx'
From v3.1.7 and v3.2.3 onwards, StarRocks introduced Data Cache to accelerate queries in shared-data clusters, replacing File Cache in earlier versions. Data Cache loads data from remote storage in blocks (on the order of MBs) as needed, while File Cache loads entire data files each time in the background, regardless of how many data rows are actually needed.
Compared to File Cache, Data Cache has the following advantages:
From v3.4.0 onwards, StarRocks uses a unified Data Cache instance for queries against external catalogs and cloud-native tables (in shared-data clusters).
You can configure Data Cache using the following CN(BE) configuration items:
Execute the following statement to view the root path that stores the cached data:
SELECT * FROM information_schema.be_configs
WHERE NAME LIKE "%storage_root_path%";
Usually, the cached data is stored under the sub-path datacache/ of your storage_root_path.
Execute the following statement to view the disk usage limit of Data Cache via the DataCacheMetrics field:
SHOW BACKENDS;
SHOW COMPUTE NODES;
StarRocks provides various metrics that monitor Data Cache.
You can download the following Grafana Dashboard templates based on your StarRocks environment:
Records the read latency of Data Cache.
Records the write latency of Data Cache.
Records the estimated memory usage of Data Cache.
Records the actual disk usage of Data Cache.
To disable Data Cache, you need to add the following configuration to the CN configuration file cn.conf, and restart the CN nodes:
datacache_enable = false
storage_root_path =
You can clear the cached data in case of emergencies. This will not affect the original data in your remote storage.
Follow these steps to clear the cached data on a CN node:
Remove the sub-directory that stores the data.
Example:
# Suppose `storage_root_path = /data/disk1;/data/disk2`
rm -rf /data/disk1/datacache/
rm -rf /data/disk2/datacache/
Restart the CN node.
datacache.enable property is set to false for a cloud-native table, Data Cache will not be enabled for the table.datacache.partition_duration property is set to a specific time range, data beyond the time range will not be cached.fslib star cache meta memory size and fslib star cache data memory size takes a significant proportion of the total memory usage of the CN node, it might indicate this issue.${storage_root_path}/starlet_cache/star_cache/meta of the CN nodes, and restart the nodes.du and ls commands) than the actual size of cached data?The disk space occupied by Data Cache represents the historical peak usage and is irrelevant to the current actual cached data size. For example, if 100 GB of data is cached, the data size will become 200 GB after compaction. Then, after garbage collection (GC), the data size was reduced to 100 GB. However, the disk space occupied by Data Cache will remain at its peak of 200 GB, even though the actual cached data within is 100 GB.
No. Data Cache evicts data only when it reaches the disk usage limit (80% of the disk space by default). The eviction process does not delete the data; it merely marks the disk space that stores the old cache as empty. The new cache will then overwrite the old cache. Therefore, even if eviction occurs, the disk usage will not decrease and will not affect actual usage.
Refer to Q2. Data Cache eviction mechanism does not delete the cached data but marks the old data as overwritable. Hence, the disk usage will not decrease.
Dropping a table does not trigger the deletion of data in the Data Cache. The cache of the deleted table will gradually be evicted over time based on Data Cache's LRU (Least Recently Used) logic, and this does not affect actual usage.
The disk usage by Data Cache is accurate and will not exceed the configured limit. The excessive disk usage may be caused by:
${storage_root_path}/persist/).You can execute du -h . -d 1 in the disk root directory or sub-directories to check the specific space-occupying directories, and then delete the unexpected portions. You can also reduce the disk capacity limit of Data Cache by configuring starlet_star_cache_disk_size_percent.
It is impossible to ensure consistent cache usage across nodes due to the inherent limitations of single-node caching. As long as it does not affect query latency, differences in cache usage are acceptable. This discrepancy may be caused by:
In summary, the differences in cache usage are influenced by multiple factors.