doc/monitoring/index.rst
.. _monitoring:
This document explains the Ceph monitoring stack and a number of important Ceph metrics.
Ceph admins can explore the rich observability stack deployed by Ceph, and can leverage Prometheus, Alertmanager, Grafana, and scripting to create customized monitoring tools.
Ceph deploys an integrated monitoring stack as described
in the :ref:Monitoring Services <mgr-cephadm-monitoring> section of
the cephadm documentation. Deployments with external fleet-wide monitoring
and observability systems using these or other tools may choose to disable
the stack that Ceph deploys by default.
Many Ceph metrics are gathered from the performance counters exposed by each
Ceph daemon. These :ref:Perf Counters are native Ceph metrics.
Performance counters are rendered into standard Prometheus metrics by the
ceph_exporter daemon. This daemon runs on every Ceph cluster host and exposes
an endpoint where performance counters exposed by Ceph
daemons running on that host are presented in the form of Prometheus metrics.
In addition to the ceph_exporter service, the Ceph Manager prometheus module
exposes metrics relating to the Ceph cluster as a whole.
Ceph provides a Prometheus endpoint from which one can obtain the complete list of available metrics, or against which admins, Grafana, and Alertmanager can execute queries.
Prometheus (and related systems) accept data queries formatted as PromQL
expressions. Expansive documentation of PromQL can be
viewed in the PromQL documentation_ and
several excellent books can be found at the usual sources of digital and print books.
We will explore a number of PromQL queries below. Use the following command to obtain the Prometheus endpoint for your cluster:
Example:
.. prompt:: bash # auto
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID prometheus.cephtest-node-00 cephtest-node-00.cephlab.com *:9095 running (103m) 50s ago 5w 142M - 2.33.4 514e6a882f6e efe3cbc2e521
With this information you can connect to
http://cephtest-node-00.cephlab.com:9095 to access the Prometheus server
interface, which includes a list of targets, an expression browser, and metrics
related to the Prometheus service itself.
The complete list of metrics (with descriptions) is available at the URL of the below form:
::
http://cephtest-node-00.cephlab.com:9095/api/v1/targets/metadata
The Ceph Dashboard provides a rich set of graphs and other panels that display the most important cluster and service metrics. Many of the examples in this document are taken from Dashboard graphics or extrapolated from metrics exposed by the Ceph Dashboard.
The ceph_exporter service provides a metric named ceph_daemon_socket_up that
indicates the health status of a Ceph daemon based on its ability to respond
via the admin socket, where a value of 1 means healthy, and 0 means
unhealthy. Although a Ceph daemon might still be "alive" when it
reports ceph_daemon_socket_up=0, this status indicates a significant issue
in its functionality. As such, this metric serves as an excellent means of
detecting problems in any of the main Ceph daemons.
The ceph_daemon_socket_up Prometheus metrics also have labels as described below:
ceph_daemon: Identifier of the Ceph daemon exposing an admin socket on the host.hostname: Name of the host where the Ceph daemon is running.Example:
.. code-block:: bash
ceph_daemon_socket_up{ceph_daemon="mds.a",hostname="testhost"} 1 ceph_daemon_socket_up{ceph_daemon="osd.1",hostname="testhost"} 0
To identify any Ceph daemons that were not responsive at any point in the last 12 hours, you can use the following PromQL expression:
.. code-block:: bash
ceph_daemon_socket_up == 0 or min_over_time(ceph_daemon_socket_up[12h]) == 0
Below we explore a number of metrics that indicate Ceph cluster performance.
All of these metrics have the following labels:
ceph_daemon: Identifier of the Ceph daemon from which the metric was harvestedinstance: The IP address of the exporter instance exposing the metricjob: Prometheus scrape job nameBelow is an example Prometheus query result showing these labels:
.. code-block:: bash
ceph_osd_op_r{ceph_daemon="osd.0", instance="192.168.122.7:9283", job="ceph"} = 73981
Cluster throughput:
Query ceph_osd_op_r_out_bytes and ceph_osd_op_w_in_bytes to obtain cluster client throughput:
Example:
.. code-block:: bash
sum(irate(ceph_osd_op_w_in_bytes[1m]))
sum(irate(ceph_osd_op_r_out_bytes[1m]))
Cluster I/O (operations):
Query ceph_osd_op_r, ceph_osd_op_w to obtain the rates of client operations (IOPS):
Example:
.. code-block:: bash
sum(irate(ceph_osd_op_w[1m]))
sum(irate(ceph_osd_op_r[1m]))
Latency:
Query ceph_osd_op_latency_sum to measure the delay before OSD transfers of data
begins in response to client requests:
Example:
.. code-block:: bash
sum(irate(ceph_osd_op_latency_sum[1m]))
The cluster performance metrics described above are gathered from OSD metrics. By specifying an appropriate label value or regular expression we can retrieve performance metrics for one or a subset of the cluster's OSDs:
Examples:
.. code-block:: bash
irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"osd.0"}[1m]) / on (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
irate(ceph_osd_op_w{ceph_daemon=~"osd.0"}[1m])
irate(ceph_osd_op_w_in_bytes{ceph_daemon=~"osd.0"}[1m])
ceph_osd_stat_bytes{ceph_daemon="osd.0", instance="cephtest-node-00.cephlab.com:9283", job="ceph"} = 536451481
By combining Prometheus node_exporter metrics with Ceph cluster metrics we can
derive performance information for physical storage media backing Ceph OSDs.
Example:
.. code-block:: bash
label_replace(irate(node_disk_read_time_seconds_total[1m]) / irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")
label_replace(irate(node_disk_write_time_seconds_total[1m]) / irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")
label_replace(irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")
label_replace(irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")
label_replace(irate(node_disk_read_bytes_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")
label_replace(irate(node_disk_written_bytes_total[1m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")
label_replace(irate(node_disk_io_time_seconds_total[5m]), "instance", "$1", "instance", "([^:.]).") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.)"), "instance", "$1", "instance", "([^:.]).*")
Ceph pool metrics have the following labels:
instance: The IP address of the exporter providing the metricpool_id: Numeric identifier of the Ceph pooljob: Prometheus scrape job namePool-specific metrics include:
ceph_pool_metadata: Information about the pool that can be used
together with other metrics to provide more information in query results
and graphs. In addition to the above three common labels, this metric
provides the following:
compression_mode: Compression type enabled for the pool. Values are
lz4, snappy, zlib, zstd, and none`). Example: compression_mode="none"``description: Brief description of the pool data protection strategy
including replica number or EC profile. Example:
description="replica:3"name: Name of the pool. Example: name=".mgr"type: Data protection strategy, replicated or EC. Example: type="replicated"ceph_pool_bytes_used: Total raw capacity (after replication or EC)
consumed by user data and metadataceph_pool_stored: Total client data stored in the pool (before data
protection)ceph_pool_compress_under_bytes: Data eligible to be compressed in
the poolceph_pool_compress_bytes_used: Data compressed in the poolceph_pool_rd: Client read operations per pool (reads per second)ceph_pool_rd_bytes: Client read operations in bytes per poolceph_pool_wr: Client write operations per pool (writes per second)ceph_pool_wr_bytes: Client write operation in bytes per pool.. code-block:: bash
sum(ceph_osd_stat_bytes)
sum(ceph_pool_bytes_used)
sum(ceph_pool_stored)
sum(ceph_pool_compress_under_bytes - ceph_pool_compress_bytes_used)
irate(ceph_pool_rd[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
irate(ceph_pool_wr[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
irate(ceph_pool_rd_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
irate(ceph_pool_wr_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
These metrics have the following labels:
instance: The IP address of the exporter providing the metricinstance_id: Identifier of the RGW daemon instancejob: Prometheus scrape job nameExample:
.. code-block:: bash
ceph_rgw_req{instance="192.168.122.7:9283", instance_id="154247", job="ceph"} = 12345
ceph_rgw_metadata: Provides generic information about an RGW daemon.
This can be used together with other metrics to provide contextual
information in query results and graphs. In addition to the three common labels,
this metric provides the following:
ceph_daemon: Name of the RGW daemon instance. Example:
ceph_daemon="rgw.rgwtest.cephtest-node-00.sxizyq"ceph_version: Version of the RGW daemon. Example: ceph_version="ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)"hostname: Name of the host where the daemon runs. Example:
hostname="cephtest-node-00.cephlab.com"ceph_rgw_req: Number of requests processed by the daemon
(GET + PUT + DELETE). Useful for detecting bottlenecks and
optimizing load distribution.ceph_rgw_qlen: Operations queue length for the daemon. Useful for
detecting bottlenecks and optimizing load distribution.ceph_rgw_failed_req: Aborted requests. Useful for detecting daemon
errors.ceph_rgw_op_global_get_obj_lat_count: Number of GET requestsceph_rgw_op_global_get_obj_lat_sum: Total latency for GET requestsceph_rgw_op_global_get_obj_ops: Total number of GET requestsceph_rgw_op_global_get_obj_bytes: Total bytes transferred for GET requestsceph_rgw_op_global_put_obj_lat_count: Number of PUT operationsceph_rgw_op_global_put_obj_lat_sum: Total latency time for PUT operationsceph_rgw_op_global_put_obj_ops: Total number of PUT operationsceph_rgw_op_global_get_obj_bytes: Total bytes transferred in PUT operations.. code-block:: bash
rate(ceph_rgw_op_global_get_obj_lat_sum[30s]) / rate(ceph_rgw_op_global_get_obj_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
rate(ceph_rgw_op_global_put_obj_lat_sum[30s]) / rate(ceph_rgw_op_global_put_obj_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
rate(ceph_rgw_req[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
LIST, DELETE, etc)rate(ceph_rgw_req[30s]) - (rate(ceph_rgw_op_global_get_obj_ops[30s]) + rate(ceph_rgw_op_global_put_obj_ops[30s]))
rate(ceph_rgw_op_global_get_obj_lat_sum[30s]) / rate(ceph_rgw_op_global_get_obj_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
rate(ceph_rgw_op_global_put_obj_lat_sum[30s]) / rate(ceph_rgw_op_global_put_obj_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
sum(rate(ceph_rgw_op_global_get_obj_bytes[30s]))
sum(rate(ceph_rgw_op_global_put_obj_bytes[30s]))
sum by (instance_id) (rate(ceph_rgw_op_global_get_obj_bytes[30s]) + rate(ceph_rgw_op_global_put_obj_bytes[30s])) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
rate(ceph_rgw_failed_req[30s])
These metrics have the following labels:
ceph_daemon: The name of the MDS daemoninstance: The IP address and port of the exporter exposing the metricjob: Prometheus scrape job nameExample:
.. code-block:: bash
ceph_mds_request{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", instance="192.168.122.7:9283", job="ceph"} = 1452
ceph_mds_metadata: Provides general information about the MDS daemon. It
can be used together with other metrics to provide contextual
information in query results and graphs. The following extra labels are populated:
ceph_version: MDS daemon versionfs_id: CephFS filesystem IDhostname: Name of the host where the MDS daemon runspublic_addr: Public address of the host where the MDS daemon runsrank: Rank of the MDS daemonExample:
.. code-block:: bash
ceph_mds_metadata{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", ceph_version="ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", fs_id="-1", hostname="cephtest-node-00.cephlab.com", instance="cephtest-node-00.cephlab.com:9283", job="ceph", public_addr="192.168.122.145:6801/118896446", rank="-1"}
ceph_mds_request: Total number of requests for the MDSceph_mds_reply_latency_sum: Reply latency totalceph_mds_reply_latency_count: Reply latency countceph_mds_server_handle_client_request: Number of client requestsceph_mds_sessions_session_count: Session countceph_mds_sessions_total_load: Total loadceph_mds_sessions_sessions_open: Sessions currently openceph_mds_sessions_sessions_stale: Sessions currently staleceph_objecter_op_r: Number of read operationsceph_objecter_op_w: Number of write operationsceph_mds_root_rbytes: Total number of bytes managed by the daemonceph_mds_root_rfiles: Total number of files managed by the daemon.. code-block:: bash
sum(rate(ceph_objecter_op_r[1m]))
sum(rate(ceph_objecter_op_w[1m]))
sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))
sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))
rate(ceph_mds_reply_latency_sum[30s]) / rate(ceph_mds_reply_latency_count[30s])
rate(ceph_mds_request[30s]) * on (instance) group_right (ceph_daemon) ceph_mds_metadata
By default RBD metrics for images are not gathered, as their cardinality may
be high. This helps ensure the performance of the Manager's prometheus module.
To produce metrics for RBD images, configure the
Manager option :confval:mgr/prometheus/rbd_stats_pools. For more information,
see :ref:prometheus-rbd-io-statistics.
These metrics have the following labels:
image: Name of the image (volume)instance: Node where the exporter runsjob: Name of the Prometheus scrape jobpool: RBD pool nameExample:
.. code-block:: bash
ceph_rbd_read_bytes{image="test2", instance="cephtest-node-00.cephlab.com:9283", job="ceph", pool="testrbdpool"}
ceph_rbd_read_bytes: RBD bytes readceph_rbd_write_bytes: RBD image bytes writtenceph_rbd_read_latency_count: RBD read operation latency countceph_rbd_read_latency_sum: RBD read operation latency total timeceph_rbd_read_ops: RBD read operation countceph_rbd_write_ops: RBD write operation countceph_rbd_write_latency_count: RBD write operation latency countceph_rbd_write_latency_sum: RBD write operation latency total.. code-block:: bash
rate(ceph_rbd_read_latency_sum[30s]) / rate(ceph_rbd_read_latency_count[30s]) * on (instance) group_left (ceph_daemon) ceph_rgw_metadata
See :ref:hardware-monitoring.
.. _PromQL documentation: https://prometheus.io/docs/prometheus/latest/querying/basics/