doc/hardware-monitoring/index.rst
.. _hardware-monitoring:
node-proxy is the internal name of the agent which inventories a machine's
hardware, provides different statuses, and enables the operator to perform
some actions.
It gathers details from the Redfish API that is often provided by an
out-of-band management interface present on server systems. The data is then
processed and pushed to an agent endpoint in the Ceph Manager.
.. graphviz::
digraph G {
node [shape=record];
mgr [label="{<mgr> ceph manager}"];
dashboard [label="<dashboard> ceph dashboard"];
agent [label="<agent> agent"];
redfish [label="<redfish> redfish"];
agent -> redfish [label=" 1." color=green];
agent -> mgr [label=" 2." color=orange];
dashboard:dashboard -> mgr [label=" 3." color=lightgreen];
node [shape=plaintext];
legend [label=<<table border="0" cellborder="1" cellspacing="0">
<tr><td bgcolor="lightgrey">Legend</td></tr>
<tr><td align="center">1. Collects data from Redfish API</td></tr>
<tr><td align="left">2. Pushes data to Ceph Manager</td></tr>
<tr><td align="left">3. Queries Ceph Manager</td></tr>
</table>>];
}
For the time being, the node-proxy agent relies on the Redfish API.
This implies that both node-proxy agent and the Ceph Manager need to be
able to access the out-of-band network.
The first step is to provide the out-of-band management tool IP address and
credentials. This can be done when adding the host with a :ref:service spec <orchestrator-cli-service-spec> file:
.. prompt:: bash # auto
service_type: host hostname: node-10 addr: 10.10.10.10 oob: addr: 20.20.20.10 username: admin password: p@ssword
Apply the spec:
.. prompt:: bash # auto
Added host 'node-10' with addr '10.10.10.10'
Deploy the agent:
.. prompt:: bash # auto
| orch hardware status [hostname] [--category CATEGORY] [--format plain | json]
Supported categories are:
Hardware Health Status Summary
.. prompt:: bash # auto
+------------+---------+-----+-----+--------+-------+------+ | HOST | STORAGE | CPU | NET | MEMORY | POWER | FANS | +------------+---------+-----+-----+--------+-------+------+ | node-10 | ok | ok | ok | ok | ok | ok | +------------+---------+-----+-----+--------+-------+------+
Storage Devices Report
.. prompt:: bash # auto
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+ | HOST | NAME | MODEL | SIZE | PROTOCOL | SN | STATUS | STATE | +------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+ | node-10 | Disk 8 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99QLL | OK | Enabled | | node-10 | Disk 10 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZYX | OK | Enabled | | node-10 | Disk 11 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZWB | OK | Enabled | | node-10 | Disk 9 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZC9 | OK | Enabled | | node-10 | Disk 3 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT9903Y | OK | Enabled | | node-10 | Disk 1 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT9901E | OK | Enabled | | node-10 | Disk 7 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZQJ | OK | Enabled | | node-10 | Disk 2 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99PA2 | OK | Enabled | | node-10 | Disk 4 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99PFG | OK | Enabled | | node-10 | Disk 0 in Backplane 0 of Storage Controller in Slot 2 | MZ7L33T8HBNAAD3 | 3840755981824 | SATA | S6M5NE0T800539 | OK | Enabled | | node-10 | Disk 1 in Backplane 0 of Storage Controller in Slot 2 | MZ7L33T8HBNAAD3 | 3840755981824 | SATA | S6M5NE0T800554 | OK | Enabled | | node-10 | Disk 6 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZER | OK | Enabled | | node-10 | Disk 0 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZEJ | OK | Enabled | | node-10 | Disk 5 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99QMH | OK | Enabled | | node-10 | Disk 0 on AHCI Controller in SL 6 | MTFDDAV240TDU | 240057409536 | SATA | 22373BB1E0F8 | OK | Enabled | | node-10 | Disk 1 on AHCI Controller in SL 6 | MTFDDAV240TDU | 240057409536 | SATA | 22373BB1E0D5 | OK | Enabled | +------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
Firmware Details
.. prompt:: bash # auto
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+ | HOST | COMPONENT | NAME | DATE | VERSION | STATUS | +------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+ | node-10 | current-107649-7.03__raid.backplane.firmware.0 | Backplane 0 | 2022-12-05T00:00:00Z | 7.03 | OK |
... omitted output ...
| node-10 | previous-25227-6.10.30.20__idrac.embedded.1-1 | Integrated Remote Access Controller | 00:00:00Z | 6.10.30.20 | OK | +------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
Hardware Critical Warnings Report
.. prompt:: bash # auto
+------------+-----------+------------+----------+-----------------+ | HOST | COMPONENT | NAME | STATUS | STATE | +------------+-----------+------------+----------+-----------------+ | node-10 | power | PS2 Status | critical | unplugged | +------------+-----------+------------+----------+-----------------+
.. py:currentmodule:: cephadm.agent .. autoclass:: NodeProxyEndpoint .. automethod:: NodeProxyEndpoint.init .. automethod:: NodeProxyEndpoint.oob .. automethod:: NodeProxyEndpoint.data .. automethod:: NodeProxyEndpoint.fullreport .. automethod:: NodeProxyEndpoint.summary .. automethod:: NodeProxyEndpoint.criticals .. automethod:: NodeProxyEndpoint.memory .. automethod:: NodeProxyEndpoint.storage .. automethod:: NodeProxyEndpoint.network .. automethod:: NodeProxyEndpoint.power .. automethod:: NodeProxyEndpoint.processors .. automethod:: NodeProxyEndpoint.fans .. automethod:: NodeProxyEndpoint.firmwares .. automethod:: NodeProxyEndpoint.led