Back to Yugabyte Db

Cache and storage subsystem metrics

docs/content/v2025.1/launch-and-manage/monitor-and-alert/metrics/cache-storage.md

2026.1.0.0-b298.5 KB
Original Source

RocksDB storage subsystem metrics

Storage layer IOPS

DocDB uses a modified version of RocksDB (an LSM-based key-value store that consists of multiple logical levels, and data in each level are sorted by key) as the storage layer. This storage layer performs seek, next, and prev operations.

The following table describes key throughput and latency metrics for the storage (RocksDB) layer.

MetricUnitTypeDescription
rocksdb_number_db_nextkeyscounterWhenever a tuple is read/updated from the database, a request is made to RocksDB key. Each database operation makes multiple requests to RocksDB. The number of NEXT operations performed to look up a key by RocksDB when a tuple is read/updated by the database.
rocksdb_number_db_prevkeyscounterThe number of PREV operations performed to look up a key by RocksDB when a tuple is read/updated from the database.
rocksdb_number_db_seekkeyscounterThe number of SEEK operations performed to look up a key by the RocksDB when a tuple is read/updated from the database. 
rocksdb_db_write_microsmicrosecondscounterThe time spent by RocksDB in microseconds to write data. 
rocksdb_db_get_microsmicrosecondscounterThe time spent by RocksDB in microseconds to retrieve data matching a value.
rocksdb_db_seek_microsmicrosecondscounterThe time spent by RocksDB in microseconds to retrieve data in a range query.

These metrics can be aggregated across the entire cluster using appropriate aggregations.

Block cache

When the data requested from YSQL layer is sitting in an SST File, it will be cached in RocksDb Block Cache. This is the fundamental cache that sits in RocksDB instead of the YSQL layer. A block requires multiple touches before it is added to the multi-touch (hot) portion of the cache.

The following table describes key cache metrics for the storage (RocksDB) layer.

MetricUnitTypeDescription
rocksdb_block_cache_hitblockscounterThe total number of block cache hits (cache index + cache filter + cache data).
rocksdb_block_cache_missblockscounterThe total number of block cache misses (cache index + cache filter + cache data).
block_cache_single_touch_usageblockscounterBlocks of data cached and read once by the YSQL layer are classified in single touch portion of the cache. The size (in bytes) of the cache usage by blocks having a single touch.
block_cache_multi_touch_usageblockscounterBlocks of data cached and read more than once by the YSQL layer are classified in the multi-touch portion of the cache. The size (in bytes) of the cache usage by blocks having multiple touches.

These metrics can be aggregated across the entire cluster using appropriate aggregations.

Bloom filters

Bloom filters are hash tables used to determine if a given SSTable has the data for a query looking for a particular value.

MetricUnitTypeDescription
rocksdb_bloom_filter_checkedblockscounterThe number of times the bloom filter has been checked.
rocksdb_bloom_filter_usefulblockscounterThe number of times the bloom filter has avoided file reads (avoiding IOPS).

These metrics can be aggregated across the entire cluster using appropriate aggregations.

SST files

RocksDB LSM-trees buffer incoming data in a memory buffer that, when full, is sorted, and flushed to disk in the form of a sorted run. When a sorted run is flushed to disk, it may be iteratively merged with existing runs of the same size. Overall, as a result of such iterative merges, the sorted runs on disk (also called Sorted-String Table or SST files) form a collection of levels of exponentially increasing size with potentially overlapping key ranges across the levels.

MetricUnitTypeDescription
rocksdb_current_version_sst_files_sizebytescounterThe aggregate size of all SST files.
rocksdb_current_version_num_sst_filesfilescounterThe number of SST files.
ts_active_data_sizebytesgaugeAmount of data in active data directories (excluding snapshots) across all non-hidden tablets. Hidden tablets (retained by a snapshot schedule) are excluded. The gives the size of the data in the cluster as if PITR is off and no snapshots are taken for the databases.
ts_data_sizebytesgaugeAmount of data in data directories (including snapshots) across all tablets. This gives the total size of the data directories including snapshots. To calculate the overhead of snapshots, subtract ts_active_data_size from ts_data_size.

These metrics can be aggregated across the entire cluster using appropriate aggregations.

Compaction

To make reads more performant over time, RocksDB periodically reduces the number of logical levels by running compaction (sorted-merge) on the SST files in the background, where part or multiple logical levels are merged into one. In other words, RocksDB uses compactions to balance write, space, and read amplifications.

A description of key metrics in this category is listed in the following table:

MetricUnitTypeDescription
rocksdb_compact_read_bytesbytescounterNumber of bytes being read to do compaction.
rocksdb_compact_write_bytesbytescounterNumber of bytes being written to do compaction.
rocksdb_compaction_times_microsmicrosecondscounterTime for the compaction process to complete.
rocksdb_numfiles_in_singlecompactionfilescounterNumber of files in any single compaction.

Memtable

Memtable is the first level of data storage where data is stored when you start inserting. It provides statistics about reading documents, which are essentially columns in the table. If a memtable is full, the existing memtable is made immutable and stored on disk as an SST file.

Memtable has statistics about reading documents, which essentially are columns in the table.

MetricUnitTypeDescription
rocksdb_memtable_compaction_microsmicrosecondscounterTotal time to compact a set of SST files.
rocksdb_memtable_hitkeyscounterNumber of memtable hits.
rocksdb_memtable_misskeyscounterNumber of memtable misses.

These metrics are available per tablet and can be aggregated across the entire cluster using appropriate aggregations.

Write-Ahead-Logging (WAL)

The Write Ahead Log (or WAL) is used to write and persist updates to disk on each tablet. The following table describes metrics for observing the performance of the WAL component.

MetricUnitTypeDescription
log_sync_latencymicrosecondscounterTime spent to flush (fsync) the WAL entries to disk.
log_append_latencymicrosecondscounterTime spent on appending a batch of values to the WAL.
log_group_commit_latencymicrosecondscounterTime spent on committing an entire group.
log_bytes_loggedbytescounterNumber of bytes written to the WAL after the tablet starts.
log_reader_bytes_readbytescounterNumber of bytes read from WAL after the tablet start.

These metrics are available per tablet and can be aggregated across the entire cluster using appropriate aggregations.

YSQL cache metrics

Catalog cache misses

During YSQL query processing, system catalog (pg_catalog) tables that live on the YB-Master are cached locally on each YSQL backend process. Misses on this cache can make initial queries or queries after a DDL change slow until the corresponding cache is warmed up. The following table describes metrics for the specific pg_catalog tables that were not found in the cache and required a YB-Master lookup. You can preload these tables using the ysql_catalog_preload_additional_table_list YB-TServer flag; see Customize preloading of YSQL catalog caches.

This metric is a counter and units are misses.

Metric (counter | misses)Description
handler_latency_yb_ysqlserver_SQLProcessor_CatalogCacheTableMisses_countCount of catalog cache misses for this pg_catalog table or an associated index.