src/collectors/proc.plugin/integrations/memory_modules_dimms.md
Plugin: proc.plugin Module: /sys/devices/system/edac/mc
The Error Detection and Correction (EDAC) subsystem is detecting and reporting errors in the system's memory, primarily ECC (Error-Correcting Code) memory errors.
The collector provides data for:
Per memory controller (MC): correctable and uncorrectable errors. These can be of 2 kinds:
Per memory DIMM: correctable and uncorrectable errors. There are 2 kinds:
This collector is supported on all platforms.
This collector supports collecting metrics from multiple instances of this integration, including remote instances.
This integration doesn't support auto-detection.
The default configuration for this integration does not impose any limits on data collection.
The default configuration for this integration is not expected to impose a significant performance impact on the system.
No action required.
There are no configuration options.
There is no configuration file.
There are no configuration examples.
The following alerts are available:
| Alert name | On metric | Description |
|---|---|---|
| ecc_memory_mc_noinfo_correctable | mem.edac_mc_errors | memory controller ${label:controller} ECC correctable errors (unknown DIMM slot) |
| ecc_memory_mc_noinfo_uncorrectable | mem.edac_mc_errors | memory controller ${label:controller} ECC uncorrectable errors (unknown DIMM slot) |
| ecc_memory_dimm_correctable | mem.edac_mc_dimm_errors | DIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC correctable errors |
| ecc_memory_dimm_uncorrectable | mem.edac_mc_dimm_errors | DIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC uncorrectable errors |
Metrics grouped by scope.
The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
These metrics refer to the memory controller.
Labels:
| Label | Description |
|---|---|
| controller | mcX directory name of this memory controller. |
| mc_name | Memory controller type. |
| size_mb | The amount of memory in megabytes that this memory controller manages. |
| max_location | Last available memory slot in this memory controller. |
Metrics:
| Metric | Dimensions | Unit |
|---|---|---|
| mem.edac_mc_errors | correctable, uncorrectable, correctable_noinfo, uncorrectable_noinfo | errors |
These metrics refer to the memory module (or rank, depends on the memory controller).
Labels:
| Label | Description |
|---|---|
| controller | mcX directory name of this memory controller. |
| dimm | dimmX or rankX directory name of this memory module. |
| dimm_dev_type | Type of DRAM device used in this memory module. For example, x1, x2, x4, x8. |
| dimm_edac_mode | Used type of error detection and correction. For example, S4ECD4ED would mean a Chipkill with x4 DRAM. |
| dimm_label | Label assigned to this memory module. |
| dimm_location | Location of the memory module. |
| dimm_mem_type | Type of the memory module. |
| size | The amount of memory in megabytes that this memory module manages. |
Metrics:
| Metric | Dimensions | Unit |
|---|---|---|
| mem.edac_mc_errors | correctable, uncorrectable | errors |