doc/cephfs/metrics.rst
.. _cephfs_metrics:
CephFS uses :ref:Perf Counters to track metrics. The counters can be labeled (:ref:Labeled Perf Counters).
CephFS exports client metrics as :ref:Labeled Perf Counters, which could be used to monitor the client performance. CephFS exports the below client metrics.
.. list-table:: Client Metrics :widths: 25 25 75 :header-rows: 1
Per-MDS-rank metrics are also exported as :ref:Labeled Perf Counters with
a rank label. These describe the MDS daemon process itself, not any
particular client or subvolume.
.. list-table:: MDS Rank Metrics :widths: 25 25 75 :header-rows: 1
cpu_usage100 == one
fully saturated core; values can exceed 100 on multi-core systems).open_requestsCephFS exports subvolume metrics as :ref:Labeled Perf Counters, which can
be used to monitor subvolume performance and utilization.
I/O Performance Metrics
I/O performance metrics (IOPS, throughput, latency) are aggregated within a
sliding window of 30 seconds by default. This interval is configurable via
the subv_metrics_window_interval parameter (see :ref:MDS config reference).
In large clusters with tens of thousands of subvolumes, this parameter also
controls when stale metrics are evicted: once the sliding window becomes empty
(no I/O activity), the metrics entry is removed rather than reporting zeros,
reducing memory usage and computational overhead.
.. important::
Metadata operations do NOT trigger metric updates. Only actual data
I/O (reads and writes to file contents) updates the sliding window and
keeps the subvolume metrics entry active. Metadata-only operations such
as mkdir, rmdir, unlink, rename, chmod, chown,
setxattr, stat, and ls do not generate I/O metrics.
This means:
used_bytes value will not immediately
reflect the freed space until either new data I/O occurs or the MDS
broadcasts updated quota information.Utilization Metrics
In addition to I/O performance, subvolume metrics include utilization counters:
quota_bytes: The configured quota limit for the subvolume
(0 if unlimited).used_bytes: Current space usage based on the inode's recursive
statistics (rstat.rbytes).These values are updated when the MDS broadcasts quota information to
clients. The used_bytes reflects the recursive byte count of the
subvolume root inode, which is maintained by the MDS as files are created,
modified, or deleted. However, since metric reporting depends on I/O
activity to keep entries alive, the utilization values are only reported
while the subvolume has active I/O within the sliding window.
.. list-table:: Subvolume Metrics :widths: 25 25 75 :header-rows: 1
avg_read_iopsavg_read_tp_Bpsavg_read_lat_msecavg_write_iopsavg_write_tp_Bpsavg_write_lat_msecquota_bytesused_bytesThe metrics could be scraped from the MDS admin socket as well as using the tell interface. The mds_client_metrics-<fsname> section in the output of counter dump command displays the metrics for each client as shown below::
"mds_client_metrics": [
{
"labels": {
"fs_name": "<fsname>",
"id": "14213"
},
"counters": {
"num_clients": 2
}
}
],
"mds_client_metrics-<fsname>": [
{
"labels": {
"client": "client.0",
"rank": "0"
},
"counters": {
"cap_hits": 5149,
"cap_miss": 1,
"avg_read_latency": 0.000000000,
"avg_write_latency": 0.000000000,
"avg_metadata_latency": 0.000000000,
"dentry_lease_hits": 0,
"dentry_lease_miss": 0,
"opened_files": 1,
"opened_inodes": 2,
"pinned_icaps": 2,
"total_inodes": 2,
"total_read_ops": 0,
"total_read_size": 0,
"total_write_ops": 4836,
"total_write_size": 633864192
}
},
{
"labels": {
"client": "client.1",
"rank": "0"
},
"counters": {
"cap_hits": 3375,
"cap_miss": 8,
"avg_read_latency": 0.000000000,
"avg_write_latency": 0.000000000,
"avg_metadata_latency": 0.000000000,
"dentry_lease_hits": 0,
"dentry_lease_miss": 0,
"opened_files": 1,
"opened_inodes": 2,
"pinned_icaps": 2,
"total_inodes": 2,
"total_read_ops": 0,
"total_read_size": 0,
"total_write_ops": 3169,
"total_write_size": 415367168
}
}
]
The subvolume metrics are dumped as a part of the same command. The mds_subvolume_metrics section in the output of counter dump command displays the metrics for each subvolume as shown below::
"mds_subvolume_metrics": [
{
"labels": {
"fs_name": "a",
"subvolume_path": "/volumes/_nogroup/test_subvolume"
},
"counters": {
"avg_read_iops": 0,
"avg_read_tp_Bps": 11,
"avg_read_lat_msec": 0,
"avg_write_iops": 1564,
"avg_write_tp_Bps": 6408316,
"avg_write_lat_msec": 338,
"quota_bytes": 10737418240,
"used_bytes": 1073741824
}
}