Back to Mountpoint S3

Metrics

doc/METRICS.md

1.22.26.7 KB
Original Source

Metrics

Mountpoint for Amazon S3 can export metrics using OpenTelemetry Protocol (OTLP) to provide visibility into FUSE requests, S3 API calls, Mountpoint throughput, etc. These metrics can be collected by CloudWatch Agent or other OTLP-compatible collectors to publish to observability backends for monitoring.

Metrics can also be logged to files using the --log-metrics option (see LOGGING.md).

Enabling metrics

To export Mountpoint metrics using OTLP, configure Mountpoint with an OTLP endpoint:

mount-s3 --otlp-endpoint http://localhost:4318 --otlp-export-interval 60 <BUCKET> <MOUNT_PATH>

Replace http://localhost:4318 with the actual endpoint of your OTLP collector. By default, Mountpoint exports metrics every 60 seconds. Use --otlp-export-interval to change this interval.

Publishing metrics to observability backends

Mountpoint exports metrics using OTLP protocol in HTTP binary format. It uses exponential histograms and delta temporality.

Not all observability backends natively support these features. Here are a few ways to publish Mountpoint metrics:

CloudWatch

We recommend using the CloudWatch Agent to export Mountpoint metrics to CloudWatch. This requires CloudWatch Agent v1.300060.0 or later for exponential histogram support.

Here is the minimal CloudWatch Agent configuration to receive OTLP metrics from Mountpoint. The http_endpoint should match the --otlp-endpoint used with Mountpoint:

json
{
  "metrics": {
    "metrics_collected": {
      "otlp": {
        "http_endpoint": "127.0.0.1:4318"
      }
    }
  }
}

If using an OpenTelemetry Collector to export to CloudWatch via the EMF exporter, note that exponential histogram data is reduced to min/max/sum/count, reducing the accuracy of percentile metrics.

For more details on CloudWatch Agent setup, refer to CloudWatch OTLP metrics documentation.

Prometheus

We recommend using Prometheus v3.0 or later to publish Mountpoint metrics directly via its OTLP receiver. Prometheus must be started with the following feature flags to support OTLP exponential histograms and delta temporality used by Mountpoint:

bash
prometheus \
  --config.file=prometheus.yml \
  --web.listen-address=:9090 \
  --web.enable-otlp-receiver \
  --enable-feature=native-histograms,otlp-deltatocumulative

Without these feature flags, histogram data may be dropped or misinterpreted. For more details on feature flags and configuration, see the Prometheus documentation.

Prometheus converts metric names to underscore separated format and may append units. For example, fuse.request_latency becomes fuse_request_latency_microseconds.

OpenTelemetry Collector

If you want to enrich, filter, or route metrics to multiple destinations, you can place an OpenTelemetry Collector between Mountpoint and the observability backend. See the OpenTelemetry Collector documentation.

Here is an example OpenTelemetry Collector configuration for routing to Prometheus:

yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 127.0.0.1:4318

exporters:
  otlphttp:
    endpoint: http://prometheus:9090/api/v1/otlp

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [otlphttp]

Available metrics

Mountpoint emits the following metrics:

MetricTypeDimensionsDescription
fuse.io_sizeHistogramfuse_request (read, write)Bytes transferred per FUSE request
fuse.request_errorsCounterfuse_request (read, write, etc.)Number of FUSE request errors
fuse.request_latencyHistogramfuse_request (read, write, etc.)Time to process a FUSE request
process.memory_usageGaugeMemory usage (RSS) of the Mountpoint process
s3.request_countCounters3_request (GetObject, PutObject, etc.)Number of S3 requests
s3.request_errorsCounters3_request (GetObject, PutObject, etc.)
http_status (403, 404, etc.)Number of S3 request errors
s3.request_first_byte_latencyHistograms3_request (GetObject, PutObject, etc.)Time from initiation of an S3 request until the first byte is received
s3.request_total_latencyHistograms3_request (GetObject, PutObject, etc.)Time from initiation of an S3 request until the response is received
experimental.cache.evict_latencyHistogramcacheTime to evict data from data cache
experimental.cache.get_latencyHistogramcacheTime to retrieve from data cache
experimental.cache.put_latencyHistogramcacheTime to store in date cache
experimental.fuse.cache_hitCounterNumber of FUSE requests fully served from data cache
(Prefetched data served from memory or partial cache hits are not included in this metric)
experimental.fuse.idle_threadsHistogramFUSE worker threads waiting for new requests
experimental.fuse.total_threadsGaugeTotal number of FUSE worker threads spawned
experimental.prefetch.reset_stateCounterTimes Mountpoint discarded prefetched data due to access patterns

[!NOTE] Metrics prefixed with experimental. may change or be removed in future versions.

Sample dashboard

To visualize these metrics, here is a sample CloudWatch dashboard template: cloudwatch.json. Update the region in the template and create a dashboard using the AWS CLI or CloudWatch console:

bash
aws cloudwatch put-dashboard --region <region> --dashboard-name <dashboard-name> --dashboard-body file://examples/dashboards/cloudwatch.json

For comprehensive monitoring, the dashboard can be extended to include S3 server-side metrics, EC2 instance metrics, and CloudWatch procstat metrics. Additional metric dimensions like EC2 instance ID can also be added using the CloudWatch Agent or OTel collector for troubleshooting.