Documentation/gpu/drm-ras.rst
.. SPDX-License-Identifier: GPL-2.0+
The DRM RAS (Reliability, Availability, Serviceability) interface provides a standardized way for GPU/accelerator drivers to expose error counters and other reliability nodes to user space via Generic Netlink. This allows diagnostic tools, monitoring daemons, or test infrastructure to query hardware health in a uniform way across different DRM drivers.
Key Goals:
Nodes are logical abstractions representing an error type or error source within the device. Currently, only error counter nodes is supported.
Drivers are responsible for registering and unregistering nodes via the
drm_ras_node_register() and drm_ras_node_unregister() APIs.
.. kernel-doc:: drivers/gpu/drm/drm_ras.c :doc: DRM RAS Node Management .. kernel-doc:: drivers/gpu/drm/drm_ras.c :internal:
The interface is implemented as a Generic Netlink family named drm-ras.
User space tools can:
list-nodes command.get-error-counter command with node-id
as a parameter.get-error-counter command, using both
node-id and error-id as parameters.The interface is described in a YAML specification Documentation/netlink/specs/drm_ras.yaml
This YAML is used to auto-generate user space bindings via
tools/net/ynl/pyynl/ynl_gen_c.py, and drives the structure of netlink
attributes and operations.
Example: List nodes using ynl
.. code-block:: bash
sudo ynl --family drm_ras --dump list-nodes
[{'device-name': '0000:03:00.0',
'node-id': 0,
'node-name': 'correctable-errors',
'node-type': 'error-counter'},
{'device-name': '0000:03:00.0',
'node-id': 1,
'node-name': 'uncorrectable-errors',
'node-type': 'error-counter'}]
Example: List all error counters using ynl
.. code-block:: bash
sudo ynl --family drm_ras --dump get-error-counter --json '{"node-id":0}'
[{'error-id': 1, 'error-name': 'error_name1', 'error-value': 0},
{'error-id': 2, 'error-name': 'error_name2', 'error-value': 0}]
Example: Query an error counter for a given node
.. code-block:: bash
sudo ynl --family drm_ras --do get-error-counter --json '{"node-id":0, "error-id":1}'
{'error-id': 1, 'error-name': 'error_name1', 'error-value': 0}