modules/apm/NAMING.md
This guide covers conventions to ensure reliable, consistent and discoverable metrics and attributes. In particular, that is
The conventions described here follow OpenTelemetry guidelines for the most part and are adapted to ES where necessary.
Metric names are composed of hierarchical segments split by a separator and prefixed with es.:
es(.<segment>)+.<suffix>
_) as word separator where necessary (e.g. blob_cache);
.).Always use es. as the root segment to easily discover ES metrics and avoid the possibility of name clashes.
Follow with a module name, team or area of code, e.g. snapshot, repositories, indices, threadpool using existing terminology (whether singular and plural).
The hierarchy of segments should be built by putting "more common" segments at the beginning. This facilitates the creation of new metrics under a common namespace. Each element in the metric name specializes or describes the prefix that precedes it. Rule of thumb: you could truncate the name at any segment, and what you're left with is something that makes sense by itself.
Example: Prefer es.indices.docs.deleted.total over es.indices.deleted.docs.total so es.indices.docs.ingested.total could be added later.
Note, to better highlight key differences, examples below starting with . omit the es. prefix and initial segments.
Recommendations:
.error.total if already in use rather than introducing .failures.total..disk.usage.status when .disk.usage exists. This might lead to mapping issues in Elasticsearch.The metric suffix is essential to describe the semantics of a metric and guide consumers on how to interpret and use a metric appropriately. If multiple suffixes are applicable, choose the most specific one.
total: a monotonic metric (always increasing counter), e.g. <code>es.indices.docs.deleted.<strong>total</strong></code>)
current: a general non-monotonic metric (like gauges, upDownCounters)
current vs total:
usage: a non-monotonic metric representing the absolute amount used of some resource (with a limit of size)size: the overall size of the resourceutilization: ratio of usage and overall size of the resourceratio: ratio of two measures with identical unit (other than usage and size) or a fraction in the range [0, 1]status: enum like gauges
1/0 to represent true/falsehistogram: a histogram metrictime: to represent passage of timeDo not include units of measurements in metric names.
Instead, use a suffix that describes the physical quantity being measured (e.g. time, size, usage, etc.) as described above.
Units are configured at registration time of the metric.
WARNING Do not use high cardinality attributes / dimensions. This might result in the APM Java agent dropping events.
It is not always straight forward to decide if something should be part of the metric name or an attribute (dimension) of that metric. As a rule of thumb:
Naming conventions for attributes are very similar to metric names.
Always use es_ as the root segment to easily discover ES attributes.
Follow with a module name, team or area of code, e.g. snapshot, repositories, indices, threadpool using existing terminology (whether singular and plural).
es(_<segment>)+
_).Attributes that represent an entity should be named in singular. If the attribute value represents a collection, it should be named in plural, e.g.
es_security_realm_type: singular, a single entityes_rest_request_headers: plural, a collection of headersUnfortunately, validation of attribute names was only introduced retrospectively. Existing attributes, that do not comply with naming conventions, are tracked in a skip list to not fail validation. However, their usage will be logged when running with assertions enabled.
Please migrate usage of such attributes following these steps:
To inspect currently registered metrics, run ES using:
./gradlew run -Dtests.es.logger.org.elasticsearch.telemetry.apm=debug
Use this as guidance where your new metric could fit in and name accordingly.