rfcs/2020-09-02-3684-metric-namespaces.md
This RFC proposes making namespace a first-class field on the internal
Metric type to allow it to be set in sources, manipulated in transforms, and
used by sinks.
This RFC will cover:
namespace of metrics into a separate field on MetricAs we add metric sources like
apache_metrics
and
postgresql_metrics,
that set their own namespaces (defaulting to apache and postgresql), it is
becoming more clear that we may want to maintain the namespace separate from
the metric name to allow for:
aws_cloudwatch_metrics)namespace.name
for metrics).I believe the current implementation proposals for these metrics sources will
simply prefix the name as the prometheus and statsd sinks do, but this will
be difficult to use with the aws_cloudwatch_metrics which requires the
namespace as a separate field for the AWS API calls.
Additionally, I think separating it will allow it to be more useful in transforms (users could currently emulate this by prefix matches of the metric name).
Add namespace to
Metric:
pub struct Metric {
pub name: String,
pub namespace: Option<String>, // added
pub timestamp: Option<DateTime<Utc>>,
pub tags: Option<BTreeMap<String, String>>,
pub kind: MetricKind,
#[serde(flatten)]
pub value: MetricValue,
}
Metric sources can then optionally assign a namespace for the metric.
For example, the upcoming MongoDB
source would set this to
mongodb.
Sinks can then decide what to do with this prefix. For example, the
prometheus sink would simply the metric name with it, but
aws_cloudwatch_metrics would use it as the Namespace field in
PutMetricData
requests.
A pipeline might look like:
[sources.my_source_id]
type = "apache_metrics"
endpoints = ["http://localhost/server-status?auto"]
namespace = "apache"
[transforms.my_transform_id]
# General
type = "lua" # required
inputs = ["my_source_id"] # required
version = "2" # required
# Hooks
hooks.process = """
function (event, emit)
if event.metric.namespace == "apache" then
-- do something
end
emit(event)
end
"""
[sinks.prometheus]
type = "prometheus"
inputs = ["my_transform_id"]
address = "0.0.0.0:9598"
namespace = ""
[sinks.cloudwatch]
type = "aws_cloudwatch_metrics"
inputs = ["my_transform_id"]
namespace = ""
region = "us-east-1"
Where the prometheus sink would simply output metrics with name prefixed by
apache_ and aws_cloudwatch_metrics would use it as the separate Namespace
field in AWS API calls.
Once Make the namespace option on metrics sinks optional #3609 is done. The sinks could look
something like:
[sinks.my_sink_id]
type = "prometheus"
inputs = ["my_transform_id"]
address = "0.0.0.0:9598"
default_namespace = "unknown"
Where a namespace could be set for any metrics that do not already have one.
Currently, I don't think there is a way to tell if a metric already has a namespace to avoid setting an additional one in sinks the require it.
metric having a number of fields
where the name is what we would call the namespace and fields is what we
would generate individual metrics for.prometheus does would be a reasonable default.We could opt to model metrics closer to how Telegraf does it where we would encode all of the metrics for a given source as one metric with a set of fields.
I didn't closely consider this option given that the proposed option seems reasonable and is a smaller change to the data model.
prometheus source parse the namespace out of
metrics it scrapes? The naming
conventions suggest that all
metrics should start with one word describing the domain (or namespace)
followed by a _ but there is requirement that prometheus endpoints satisfy
this. We could make it optional directive on the source to control parsing
metric namespaces.Incremental steps that execute this change. Generally this is in the form of:
namespace modeled as a first-class fieldNone