Back to Ceph

Metrics

doc/cephfs/metrics.rst

21.0.08.3 KB
Original Source

.. _cephfs_metrics:

Metrics

CephFS uses :ref:Perf Counters to track metrics. The counters can be labeled (:ref:Labeled Perf Counters).

Client Metrics

CephFS exports client metrics as :ref:Labeled Perf Counters, which could be used to monitor the client performance. CephFS exports the below client metrics.

.. list-table:: Client Metrics :widths: 25 25 75 :header-rows: 1

    • Name
    • Type
    • Description
    • num_clients
    • Gauge
    • Number of client sessions
    • cap_hits
    • Gauge
    • Percentage of file capability hits over total number of caps
    • cap_miss
    • Gauge
    • Percentage of file capability misses over total number of caps
    • avg_read_latency
    • Gauge
    • Mean value of the read latencies
    • avg_write_latency
    • Gauge
    • Mean value of the write latencies
    • avg_metadata_latency
    • Gauge
    • Mean value of the metadata latencies
    • dentry_lease_hits
    • Gauge
    • Percentage of dentry lease hits handed out over the total dentry lease requests
    • dentry_lease_miss
    • Gauge
    • Percentage of dentry lease misses handed out over the total dentry lease requests
    • opened_files
    • Gauge
    • Number of opened files
    • opened_inodes
    • Gauge
    • Number of opened inodes
    • pinned_icaps
    • Gauge
    • Number of pinned Inode Caps
    • total_inodes
    • Gauge
    • Total number of Inodes
    • total_read_ops
    • Gauge
    • Total number of read operations generated by all processes
    • total_read_size
    • Gauge
    • Number of bytes read in input/output operations generated by all processes
    • total_write_ops
    • Gauge
    • Total number of write operations generated by all processes
    • total_write_size
    • Gauge
    • Number of bytes written in input/output operations generated by all processes

MDS Rank Metrics

Per-MDS-rank metrics are also exported as :ref:Labeled Perf Counters with a rank label. These describe the MDS daemon process itself, not any particular client or subvolume.

.. list-table:: MDS Rank Metrics :widths: 25 25 75 :header-rows: 1

    • Name
    • Type
    • Description
    • cpu_usage
    • Gauge
    • Sum of per-core CPU utilisation for the MDS process (100 == one fully saturated core; values can exceed 100 on multi-core systems).
    • open_requests
    • Gauge
    • Number of metadata requests currently in flight for this MDS rank.

Subvolume Metrics

CephFS exports subvolume metrics as :ref:Labeled Perf Counters, which can be used to monitor subvolume performance and utilization.

I/O Performance Metrics

I/O performance metrics (IOPS, throughput, latency) are aggregated within a sliding window of 30 seconds by default. This interval is configurable via the subv_metrics_window_interval parameter (see :ref:MDS config reference). In large clusters with tens of thousands of subvolumes, this parameter also controls when stale metrics are evicted: once the sliding window becomes empty (no I/O activity), the metrics entry is removed rather than reporting zeros, reducing memory usage and computational overhead.

.. important::

Metadata operations do NOT trigger metric updates. Only actual data I/O (reads and writes to file contents) updates the sliding window and keeps the subvolume metrics entry active. Metadata-only operations such as mkdir, rmdir, unlink, rename, chmod, chown, setxattr, stat, and ls do not generate I/O metrics.

This means:

  • If a subvolume has only metadata activity (e.g., creating/deleting files without writing data), its I/O metrics will show zeros or the entry may be evicted after the window expires.
  • After deleting files, the used_bytes value will not immediately reflect the freed space until either new data I/O occurs or the MDS broadcasts updated quota information.

Utilization Metrics

In addition to I/O performance, subvolume metrics include utilization counters:

  • quota_bytes: The configured quota limit for the subvolume (0 if unlimited).
  • used_bytes: Current space usage based on the inode's recursive statistics (rstat.rbytes).

These values are updated when the MDS broadcasts quota information to clients. The used_bytes reflects the recursive byte count of the subvolume root inode, which is maintained by the MDS as files are created, modified, or deleted. However, since metric reporting depends on I/O activity to keep entries alive, the utilization values are only reported while the subvolume has active I/O within the sliding window.

.. list-table:: Subvolume Metrics :widths: 25 25 75 :header-rows: 1

    • Name
    • Type
    • Description
    • avg_read_iops
    • Gauge
    • Average read IOPS (input/output operations per second) over the sliding window.
    • avg_read_tp_Bps
    • Gauge
    • Average read throughput in bytes per second.
    • avg_read_lat_msec
    • Gauge
    • Average read latency in milliseconds.
    • avg_write_iops
    • Gauge
    • Average write IOPS over the sliding window.
    • avg_write_tp_Bps
    • Gauge
    • Average write throughput in bytes per second.
    • avg_write_lat_msec
    • Gauge
    • Average write latency in milliseconds.
    • quota_bytes
    • Gauge
    • Configured quota limit in bytes (0 if no quota/unlimited).
    • used_bytes
    • Gauge
    • Current space usage in bytes (recursive byte count of subvolume root).

Getting Metrics

The metrics could be scraped from the MDS admin socket as well as using the tell interface. The mds_client_metrics-<fsname> section in the output of counter dump command displays the metrics for each client as shown below::

"mds_client_metrics": [
    {
        "labels": {
            "fs_name": "<fsname>",
            "id": "14213"
        },
        "counters": {
            "num_clients": 2
        }
    }
],
"mds_client_metrics-<fsname>": [
    {
        "labels": {
            "client": "client.0",
            "rank": "0"
        },
        "counters": {
            "cap_hits": 5149,
            "cap_miss": 1,
            "avg_read_latency": 0.000000000,
            "avg_write_latency": 0.000000000,
            "avg_metadata_latency": 0.000000000,
            "dentry_lease_hits": 0,
            "dentry_lease_miss": 0,
            "opened_files": 1,
            "opened_inodes": 2,
            "pinned_icaps": 2,
            "total_inodes": 2,
            "total_read_ops": 0,
            "total_read_size": 0,
            "total_write_ops": 4836,
            "total_write_size": 633864192
        }
    },
    {
        "labels": {
            "client": "client.1",
            "rank": "0"
        },
        "counters": {
            "cap_hits": 3375,
            "cap_miss": 8,
            "avg_read_latency": 0.000000000,
            "avg_write_latency": 0.000000000,
            "avg_metadata_latency": 0.000000000,
            "dentry_lease_hits": 0,
            "dentry_lease_miss": 0,
            "opened_files": 1,
            "opened_inodes": 2,
            "pinned_icaps": 2,
            "total_inodes": 2,
            "total_read_ops": 0,
            "total_read_size": 0,
            "total_write_ops": 3169,
            "total_write_size": 415367168
        }
    }
]

The subvolume metrics are dumped as a part of the same command. The mds_subvolume_metrics section in the output of counter dump command displays the metrics for each subvolume as shown below::

"mds_subvolume_metrics": [
    {
        "labels": {
            "fs_name": "a",
            "subvolume_path": "/volumes/_nogroup/test_subvolume"
        },
        "counters": {
            "avg_read_iops": 0,
            "avg_read_tp_Bps": 11,
            "avg_read_lat_msec": 0,
            "avg_write_iops": 1564,
            "avg_write_tp_Bps": 6408316,
            "avg_write_lat_msec": 338,
            "quota_bytes": 10737418240,
            "used_bytes": 1073741824
        }
    }