Documentation/driver-api/hw-recoverable-errors.rst
.. SPDX-License-Identifier: GPL-2.0
This feature provides a generic infrastructure within the Linux kernel to track and log recoverable hardware errors. These are hardware recoverable errors visible that might not cause immediate panics but may influence health, mainly because new code path will be executed in the kernel.
By recording counts and timestamps of recoverable errors into the vmcoreinfo crash dump notes, this infrastructure aids post-mortem crash analysis tools in correlating hardware events with kernel failures. This enables faster triage and better understanding of root causes, especially in large-scale cloud environments where hardware issues are common.
hwerror_data array, categorized by error source
types like CPU, memory, PCI, CXL, and others.crash, drgn, or other kernel crash analysis utilities.Typical usage example (in drgn REPL):
.. code-block:: python
>>> prog['hwerror_data']
(struct hwerror_info[HWERR_RECOV_MAX]){
{
.count = (int)844,
.timestamp = (time64_t)1752852018,
},
...
}