rfcs/2020-08-31-3640-nginx-metrics-source.md
This RFC is to introduce a new metrics source to consume metrics from the
Nginx HTTP Server. The high level plan is
to implement a scraper similar to the existing prometheus
source that will scrape
the Nginx HTTP Server stats endpoint (provided by
stub_status) on an
interval and publish metrics to the defined pipeline.
This RFC will cover:
This RFC will not cover:
Users running Nginx want to collect, transform, and forward metrics to better observe how their webserver's are performing.
I expect to largely copy the existing prometheus source and modify it to parse the output of the Nginx stub status page which looks like:
Active connections: 1
server accepts handled requests
1 1 1
Reading: 0 Writing: 1 Waiting: 0
The breakdown of this output:
We'll use this to generate the following metrics:
nginx_up (gauge)nginx_connections_active (gauge)nginx_connections_accepted_total (counter)nginx_connections_reading (gauge)nginx_connections_waiting (gauge)nginx_connections_writing (gauge)nginx_http_requests_total (counter)Metrics will be labeled with:
endpoint the full endpoint (sans any basic authentication credentials)host the host name and port portions of the endpointUsers will be instructed to setup
stub_status
The following additional source configuration will be added:
[sources.my_source_id]
type = "nginx_metrics" # required
endpoints = ["http://localhost/basic_status"] # required, default
scrape_interval_secs = 15 # optional, default, seconds
namespace = "nginx" # optional, default, namespace to put metrics under
Some possible configuration improvements we could add in the future would be:
response_timeout; to cap request lengthstls: settings to allow setting specific chains of trust and client certsbasic_auth: to set username/password for use with HTTP basic auth; we'll
allow this to be set in the URL too which will work for nowThe host key will be set to the host parsed out of the endpoint.
Nginx HTTP server is a common web server. If we do not support ingesting metrics from it, it is likely to push people to use another tool to forward metrics from Nginx to the desired sink.
As part of Vector's vision to be the "one tool" for ingesting and shipping observability data, it makes sense to add as many sources as possible to reduce the likelihood that a user will not be able to ingest metrics from their tools.
We could not add the source directly to Vector and instead instruct users to run Telegraf and point Vector at the exposed Prometheus scrape endpoint. This would leverage the already supported telegraf Nginx input plugin
Or someone could use the Prometheus Nginx exporter directly and the prometheus sink.
We decided against this as it would be in contrast with one of the listed principles of Vector:
One Tool. All Data. - One simple tool gets your logs, metrics, and traces (coming soon) from A to B.
On the same page, it is mentioned that Vector should be a replacement for Telegraf.
You SHOULD use Vector to replace Logstash, Fluent*, Telegraf, Beats, or similar tools.
If users are already running Telegraf though, they could opt for this path.
Incremental steps that execute this change. Generally this is in the form of:
I think one thing that would make sense would be to refactor the sources based
on HTTP scraping to share a base similar to how our sinks that rely on http
are factored (splunk_hec, http, loki, etc.). This allows them to share
common configuration options for their behavior.