docs/RFCS/20230531_query_sst_metrics.md
Storage Team engineers are often involved in support escalations from customers that require inspection of SSTable-level statistics. Currently, these statistics are difficult to obtain and require work from the customer and support teams to find appropriate files to pull from the filesystem to send to us. As a result, this RFC outlines how we will add the ability for operators to query sstable metrics which is useful for debugging storage issues pertaining to a specific key range. This will be implemented using a set-generating function (SRF) and used as follows.
SELECT * FROM crdb_internal.sstable_metrics('start-key', 'end-key')
or
SELECT * FROM crdb_internal.sstable_metrics(node-id, store-id, 'start-key', 'end-key')
Audience: CockroachDB team members
The proposed solution is creating a new SRF which will be added to the existing
built-in
generators.
This SRF will have two overloads (for the two variants above). The latter one
only talks to one node, while the other will need to send an RPC to each node
in order to retrieve all the relevant SSTables. This can be achieved by calling
Dial
for each separate node inside of a function that is part of evalCtx. This
function will call out to the
StorageEngineClient
and be handled in the stores
server
(see Pebble side).
The SRF will be structured similar to json_populate_record i.e. using a generator to return each output row.
Columns to display:
Inside the store's server code is where Pebble will be used, specifically
DB.SSTables.
When calling DB.SSTables we will need to specify a SSTableOption which will
be a function allowing us to filter SSTables for the key range specified by the
user. Note that filtering can be performed based on the FileMetadata alone,
which allows us to skip unnecessary getTableProperties calls (which can read
metadata from storage and affect caches).
Audience: all participants to the RFC review.
One option is to only support global keys in the variant in which the user does not specify a specific node.