.. _monitoring:

=================== Monitoring Overview

This document explains the Ceph monitoring stack and a number of important Ceph metrics.

Ceph admins can explore the rich observability stack deployed by Ceph, and can leverage Prometheus, Alertmanager, Grafana, and scripting to create customized monitoring tools.

Ceph Monitoring Stack

Ceph deploys an integrated monitoring stack as described in the :ref:Monitoring Services <mgr-cephadm-monitoring> section of the cephadm documentation. Deployments with external fleet-wide monitoring and observability systems using these or other tools may choose to disable the stack that Ceph deploys by default.

Ceph Metrics

Many Ceph metrics are gathered from the performance counters exposed by each Ceph daemon. These :ref:Perf Counters are native Ceph metrics.

Performance counters are rendered into standard Prometheus metrics by the ceph_exporter daemon. This daemon runs on every Ceph cluster host and exposes an endpoint where performance counters exposed by Ceph daemons running on that host are presented in the form of Prometheus metrics.

In addition to the ceph_exporter service, the Ceph Manager prometheus module exposes metrics relating to the Ceph cluster as a whole.

Ceph provides a Prometheus endpoint from which one can obtain the complete list of available metrics, or against which admins, Grafana, and Alertmanager can execute queries.

Prometheus (and related systems) accept data queries formatted as PromQL expressions. Expansive documentation of PromQL can be viewed in the PromQL documentation_ and several excellent books can be found at the usual sources of digital and print books.

We will explore a number of PromQL queries below. Use the following command to obtain the Prometheus endpoint for your cluster:

Example:

.. prompt:: bash # auto

ceph orch ps --service_name prometheus

NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID prometheus.cephtest-node-00 cephtest-node-00.cephlab.com *:9095 running (103m) 50s ago 5w 142M - 2.33.4 514e6a882f6e efe3cbc2e521

With this information you can connect to http://cephtest-node-00.cephlab.com:9095 to access the Prometheus server interface, which includes a list of targets, an expression browser, and metrics related to the Prometheus service itself.

The complete list of metrics (with descriptions) is available at the URL of the below form:

http://cephtest-node-00.cephlab.com:9095/api/v1/targets/metadata

The Ceph Dashboard provides a rich set of graphs and other panels that display the most important cluster and service metrics. Many of the examples in this document are taken from Dashboard graphics or extrapolated from metrics exposed by the Ceph Dashboard.

Ceph Daemon Health Metrics

The ceph_exporter service provides a metric named ceph_daemon_socket_up that indicates the health status of a Ceph daemon based on its ability to respond via the admin socket, where a value of 1 means healthy, and 0 means unhealthy. Although a Ceph daemon might still be "alive" when it reports ceph_daemon_socket_up=0, this status indicates a significant issue in its functionality. As such, this metric serves as an excellent means of detecting problems in any of the main Ceph daemons.

The ceph_daemon_socket_up Prometheus metrics also have labels as described below:

ceph_daemon: Identifier of the Ceph daemon exposing an admin socket on the host.
hostname: Name of the host where the Ceph daemon is running.

Example:

.. code-block:: bash

ceph_daemon_socket_up{ceph_daemon="mds.a",hostname="testhost"} 1 ceph_daemon_socket_up{ceph_daemon="osd.1",hostname="testhost"} 0

To identify any Ceph daemons that were not responsive at any point in the last 12 hours, you can use the following PromQL expression:

.. code-block:: bash

ceph_daemon_socket_up == 0 or min_over_time(ceph_daemon_socket_up[12h]) == 0

Performance Metrics

Below we explore a number of metrics that indicate Ceph cluster performance.

All of these metrics have the following labels:

ceph_daemon: Identifier of the Ceph daemon from which the metric was harvested
instance: The IP address of the exporter instance exposing the metric
job: Prometheus scrape job name

Below is an example Prometheus query result showing these labels:

.. code-block:: bash

ceph_osd_op_r{ceph_daemon="osd.0", instance="192.168.122.7:9283", job="ceph"} = 73981

Cluster throughput: Query ceph_osd_op_r_out_bytes and ceph_osd_op_w_in_bytes to obtain cluster client throughput:

Example:

.. code-block:: bash

Writes (B/s)

sum(irate(ceph_osd_op_w_in_bytes[1m]))

Reads (B/s)

sum(irate(ceph_osd_op_r_out_bytes[1m]))

Cluster I/O (operations): Query ceph_osd_op_r, ceph_osd_op_w to obtain the rates of client operations (IOPS):

Example:

.. code-block:: bash

Writes (ops/s)

sum(irate(ceph_osd_op_w[1m]))

Reads (ops/s)

sum(irate(ceph_osd_op_r[1m]))

Latency: Query ceph_osd_op_latency_sum to measure the delay before OSD transfers of data begins in response to client requests:

Example:

.. code-block:: bash

sum(irate(ceph_osd_op_latency_sum[1m]))

OSD Performance

The cluster performance metrics described above are gathered from OSD metrics. By specifying an appropriate label value or regular expression we can retrieve performance metrics for one or a subset of the cluster's OSDs:

Examples:

.. code-block:: bash

OSD 0 read latency

irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"osd.0"}[1m]) / on (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])

OSD 0 write IOPS

irate(ceph_osd_op_w{ceph_daemon=~"osd.0"}[1m])

OSD 0 write throughput (bytes)

irate(ceph_osd_op_w_in_bytes{ceph_daemon=~"osd.0"}[1m])

OSD.0 total raw capacity available

ceph_osd_stat_bytes{ceph_daemon="osd.0", instance="cephtest-node-00.cephlab.com:9283", job="ceph"} = 536451481

Physical Storage Drive Performance

By combining Prometheus node_exporter metrics with Ceph cluster metrics we can derive performance information for physical storage media backing Ceph OSDs.

Example:

.. code-block:: bash

Read latency of device used by osd.0

label_replace(irate(node_disk_read_time_seconds_total[1m]) / irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")

Write latency of device used by osd.0

label_replace(irate(node_disk_write_time_seconds_total[1m]) / irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")

IOPS of device used by osd.0

reads:

label_replace(irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")

writes:

label_replace(irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")

Throughput for device used by osd.0

reads:

label_replace(irate(node_disk_read_bytes_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")

writes:

label_replace(irate(node_disk_written_bytes_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")

Physical drive utilization (%) for osd.0 in the last 5 minutes. Note that this value has limited mean for SSDs

label_replace(irate(node_disk_io_time_seconds_total[5m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")

Pool Metrics

Ceph pool metrics have the following labels:

instance: The IP address of the exporter providing the metric
pool_id: Numeric identifier of the Ceph pool
job: Prometheus scrape job name

Pool-specific metrics include:

ceph_pool_metadata: Information about the pool that can be used together with other metrics to provide more information in query results and graphs. In addition to the above three common labels, this metric provides the following:
- compression_mode: Compression type enabled for the pool. Values are lz4, snappy, zlib, zstd, and none`). Example: compression_mode="none"``
- description: Brief description of the pool data protection strategy including replica number or EC profile. Example: description="replica:3"
- name: Name of the pool. Example: name=".mgr"
- type: Data protection strategy, replicated or EC. Example: type="replicated"
- ceph_pool_bytes_used: Total raw capacity (after replication or EC) consumed by user data and metadata
- ceph_pool_stored: Total client data stored in the pool (before data protection)
- ceph_pool_compress_under_bytes: Data eligible to be compressed in the pool
- ceph_pool_compress_bytes_used: Data compressed in the pool
- ceph_pool_rd: Client read operations per pool (reads per second)
- ceph_pool_rd_bytes: Client read operations in bytes per pool
- ceph_pool_wr: Client write operations per pool (writes per second)
- ceph_pool_wr_bytes: Client write operation in bytes per pool

Useful Queries

.. code-block:: bash

Total raw capacity available in the cluster

sum(ceph_osd_stat_bytes)

Total raw capacity consumed in the cluster (including metadata + redundancy)

sum(ceph_pool_bytes_used)

Total client data stored in the cluster

sum(ceph_pool_stored)

Compression savings

sum(ceph_pool_compress_under_bytes - ceph_pool_compress_bytes_used)

Client IOPS for a specific pool

reads:

irate(ceph_pool_rd[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}

writes:

irate(ceph_pool_wr[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}

Client throughput for a specific pool

reads:

irate(ceph_pool_rd_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}

writes:

irate(ceph_pool_wr_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}

RGW Metrics

These metrics have the following labels:

instance: The IP address of the exporter providing the metric
instance_id: Identifier of the RGW daemon instance
job: Prometheus scrape job name

Example:

.. code-block:: bash

ceph_rgw_req{instance="192.168.122.7:9283", instance_id="154247", job="ceph"} = 12345

Generic Metrics

ceph_rgw_metadata: Provides generic information about an RGW daemon. This can be used together with other metrics to provide contextual information in query results and graphs. In addition to the three common labels, this metric provides the following:
- ceph_daemon: Name of the RGW daemon instance. Example: ceph_daemon="rgw.rgwtest.cephtest-node-00.sxizyq"
- ceph_version: Version of the RGW daemon. Example: ceph_version="ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)"
- hostname: Name of the host where the daemon runs. Example: hostname="cephtest-node-00.cephlab.com"
- ceph_rgw_req: Number of requests processed by the daemon (GET + PUT + DELETE). Useful for detecting bottlenecks and optimizing load distribution.
- ceph_rgw_qlen: Operations queue length for the daemon. Useful for detecting bottlenecks and optimizing load distribution.
- ceph_rgw_failed_req: Aborted requests. Useful for detecting daemon errors.

GET Operation Metrics

ceph_rgw_op_global_get_obj_lat_count: Number of GET requests
ceph_rgw_op_global_get_obj_lat_sum: Total latency for GET requests
ceph_rgw_op_global_get_obj_ops: Total number of GET requests
ceph_rgw_op_global_get_obj_bytes: Total bytes transferred for GET requests

PUT Operation Metrics

ceph_rgw_op_global_put_obj_lat_count: Number of PUT operations
ceph_rgw_op_global_put_obj_lat_sum: Total latency time for PUT operations
ceph_rgw_op_global_put_obj_ops: Total number of PUT operations
ceph_rgw_op_global_get_obj_bytes: Total bytes transferred in PUT operations

Additional Useful Queries

.. code-block:: bash

Average GET latency

rate(ceph_rgw_op_global_get_obj_lat_sum[30s]) / rate(ceph_rgw_op_global_get_obj_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

Average PUT latency

rate(ceph_rgw_op_global_put_obj_lat_sum[30s]) / rate(ceph_rgw_op_global_put_obj_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

Requests per second

rate(ceph_rgw_req[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

Total number of "other" operations (`LIST`, `DELETE`, etc)

rate(ceph_rgw_req[30s]) - (rate(ceph_rgw_op_global_get_obj_ops[30s]) + rate(ceph_rgw_op_global_put_obj_ops[30s]))

GET latency per RGW instance

rate(ceph_rgw_op_global_get_obj_lat_sum[30s]) / rate(ceph_rgw_op_global_get_obj_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

PUT latency per RGW instance

rate(ceph_rgw_op_global_put_obj_lat_sum[30s]) / rate(ceph_rgw_op_global_put_obj_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

Bandwidth consumed by GET operations

sum(rate(ceph_rgw_op_global_get_obj_bytes[30s]))

Bandwidth consumed by PUT operations

sum(rate(ceph_rgw_op_global_put_obj_bytes[30s]))

Bandwidth consumed by RGW instance (PUTs + GETs)

sum by (instance_id) (rate(ceph_rgw_op_global_get_obj_bytes[30s]) + rate(ceph_rgw_op_global_put_obj_bytes[30s])) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata

HTTP errors and other request failures

rate(ceph_rgw_failed_req[30s])

CephFS Metrics

These metrics have the following labels:

ceph_daemon: The name of the MDS daemon
instance: The IP address and port of the exporter exposing the metric
job: Prometheus scrape job name

Example:

.. code-block:: bash

ceph_mds_request{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", instance="192.168.122.7:9283", job="ceph"} = 1452

Important Metrics

ceph_mds_metadata: Provides general information about the MDS daemon. It can be used together with other metrics to provide contextual information in query results and graphs. The following extra labels are populated:
- ceph_version: MDS daemon version
- fs_id: CephFS filesystem ID
- hostname: Name of the host where the MDS daemon runs
- public_addr: Public address of the host where the MDS daemon runs
- rank: Rank of the MDS daemon

Example:

.. code-block:: bash

ceph_mds_metadata{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", ceph_version="ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", fs_id="-1", hostname="cephtest-node-00.cephlab.com", instance="cephtest-node-00.cephlab.com:9283", job="ceph", public_addr="192.168.122.145:6801/118896446", rank="-1"}

ceph_mds_request: Total number of requests for the MDS
ceph_mds_reply_latency_sum: Reply latency total
ceph_mds_reply_latency_count: Reply latency count
ceph_mds_server_handle_client_request: Number of client requests
ceph_mds_sessions_session_count: Session count
ceph_mds_sessions_total_load: Total load
ceph_mds_sessions_sessions_open: Sessions currently open
ceph_mds_sessions_sessions_stale: Sessions currently stale
ceph_objecter_op_r: Number of read operations
ceph_objecter_op_w: Number of write operations
ceph_mds_root_rbytes: Total number of bytes managed by the daemon
ceph_mds_root_rfiles: Total number of files managed by the daemon

Useful Queries

.. code-block:: bash

Total MDS read workload

sum(rate(ceph_objecter_op_r[1m]))

Total MDS daemons workload

sum(rate(ceph_objecter_op_w[1m]))

Read workload for a specific MDS

sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))

Write workload for a specific MDS

sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))

Average reply latency

rate(ceph_mds_reply_latency_sum[30s]) / rate(ceph_mds_reply_latency_count[30s])

Total requests per second

rate(ceph_mds_request[30s]) * on (instance) group_right (ceph_daemon) ceph_mds_metadata

Block Metrics

By default RBD metrics for images are not gathered, as their cardinality may be high. This helps ensure the performance of the Manager's prometheus module.

To produce metrics for RBD images, configure the Manager option :confval:mgr/prometheus/rbd_stats_pools. For more information, see :ref:prometheus-rbd-io-statistics.

These metrics have the following labels:

image: Name of the image (volume)
instance: Node where the exporter runs
job: Name of the Prometheus scrape job
pool: RBD pool name

Example:

.. code-block:: bash

ceph_rbd_read_bytes{image="test2", instance="cephtest-node-00.cephlab.com:9283", job="ceph", pool="testrbdpool"}

Important Metrics

ceph_rbd_read_bytes: RBD bytes read
ceph_rbd_write_bytes: RBD image bytes written
ceph_rbd_read_latency_count: RBD read operation latency count
ceph_rbd_read_latency_sum: RBD read operation latency total time
ceph_rbd_read_ops: RBD read operation count
ceph_rbd_write_ops: RBD write operation count
ceph_rbd_write_latency_count: RBD write operation latency count
ceph_rbd_write_latency_sum: RBD write operation latency total

Useful Queries

.. code-block:: bash

Average read latency

rate(ceph_rbd_read_latency_sum[30s]) / rate(ceph_rbd_read_latency_count[30s]) * on (instance) group_left (ceph_daemon) ceph_rgw_metadata

Hardware Monitoring

See :ref:hardware-monitoring.

.. _PromQL documentation: https://prometheus.io/docs/prometheus/latest/querying/basics/

Monitoring Overview

=================== Monitoring Overview

Ceph Monitoring Stack

Ceph Metrics

ceph orch ps --service_name prometheus

Ceph Daemon Health Metrics

Performance Metrics

Writes (B/s)

Reads (B/s)

Writes (ops/s)

Reads (ops/s)

OSD Performance

OSD 0 read latency

OSD 0 write IOPS

OSD 0 write throughput (bytes)

OSD.0 total raw capacity available

Physical Storage Drive Performance

Read latency of device used by osd.0

Write latency of device used by osd.0

IOPS of device used by osd.0

reads:

writes:

Throughput for device used by osd.0

reads:

writes:

Physical drive utilization (%) for osd.0 in the last 5 minutes. Note that this value has limited mean for SSDs

Pool Metrics

Useful Queries

Total raw capacity available in the cluster

Total raw capacity consumed in the cluster (including metadata + redundancy)

Total client data stored in the cluster

Compression savings

Client IOPS for a specific pool

reads:

writes:

Client throughput for a specific pool

reads:

writes:

RGW Metrics

Generic Metrics

GET Operation Metrics

PUT Operation Metrics

Additional Useful Queries

Average GET latency

Average PUT latency

Requests per second

Total number of "other" operations (LIST, DELETE, etc)

GET latency per RGW instance

PUT latency per RGW instance

Bandwidth consumed by GET operations

Bandwidth consumed by PUT operations

Bandwidth consumed by RGW instance (PUTs + GETs)

HTTP errors and other request failures

CephFS Metrics

Important Metrics

Useful Queries

Total MDS read workload

Total MDS daemons workload

Read workload for a specific MDS

Write workload for a specific MDS

Average reply latency

Total requests per second

Block Metrics

Important Metrics

Useful Queries

Average read latency

Hardware Monitoring

Total number of "other" operations (`LIST`, `DELETE`, etc)