docs/root/configuration/upstream/cluster_manager/cluster_stats.rst
.. _config_cluster_manager_cluster_stats:
.. contents:: :local:
The cluster manager has a statistics tree rooted at cluster_manager. with the following
statistics. Any : character in the stats name is replaced with _. Stats include
all clusters managed by the cluster manager, including both clusters used for data plane
upstreams and control plane xDS clusters.
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
cluster_added, Counter, Total clusters added (either via static config or CDS) cluster_modified, Counter, Total clusters modified (via CDS) cluster_removed, Counter, Total clusters removed (via CDS) cluster_updated, Counter, Total cluster updates cluster_updated_via_merge, Counter, Total cluster updates applied as merged updates update_merge_cancelled, Counter, Total merged updates that got cancelled and delivered early update_out_of_merge_window, Counter, Total updates which arrived out of a merge window active_clusters, Gauge, Number of currently active (warmed) clusters warming_clusters, Gauge, Number of currently warming (not active) clusters
In addition to the cluster manager stats, there are per worker thread local cluster manager statistics tree rooted at thread_local_cluster_manager.<worker_id>. with the following statistics.
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
clusters_inflated, Gauge, Number of clusters the worker has initialized. If using cluster deferral this number should be <= (cluster_added - clusters_removed).
.. _config_cluster_stats:
Every cluster has a statistics tree rooted at cluster.<name>. with the following statistics:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
upstream_cx_total, Counter, Total connections
upstream_cx_active, Gauge, Total active connections
upstream_cx_http1_total, Counter, Total HTTP/1.1 connections
upstream_cx_http2_total, Counter, Total HTTP/2 connections
upstream_cx_http3_total, Counter, Total HTTP/3 connections
upstream_cx_connect_fail, Counter, Total connection failures
upstream_cx_connect_timeout, Counter, Total connection connect timeouts
upstream_cx_connect_with_0_rtt, Counter, Total connections able to send 0-rtt requests (early data).
upstream_cx_idle_timeout, Counter, Total connection idle timeouts
upstream_cx_max_duration_reached, Counter, Total connections closed due to max duration reached
upstream_cx_connect_attempts_exceeded, Counter, Total consecutive connection failures exceeding configured connection attempts
upstream_cx_overflow, Counter, Total times that the cluster's connection circuit breaker overflowed
upstream_cx_connect_ms, Histogram, Connection establishment milliseconds
upstream_cx_length_ms, Histogram, Connection length milliseconds
upstream_cx_destroy, Counter, Total destroyed connections
upstream_cx_destroy_local, Counter, Total connections destroyed locally
upstream_cx_destroy_remote, Counter, Total connections destroyed remotely
upstream_cx_destroy_with_active_rq, Counter, Total connections destroyed with 1+ active request
upstream_cx_destroy_local_with_active_rq, Counter, Total connections destroyed locally with 1+ active request
upstream_cx_destroy_remote_with_active_rq, Counter, Total connections destroyed remotely with 1+ active request
upstream_cx_close_notify, Counter, Total connections closed via HTTP/1.1 connection close header or HTTP/2 or HTTP/3 GOAWAY
upstream_cx_rx_bytes_total, Counter, Total received connection bytes
upstream_cx_rx_bytes_buffered, Gauge, Received connection bytes currently buffered
upstream_cx_tx_bytes_total, Counter, Total sent connection bytes
upstream_cx_tx_bytes_buffered, Gauge, Send connection bytes currently buffered
upstream_cx_pool_overflow, Counter, Total times that the cluster's connection pool circuit breaker overflowed
upstream_cx_protocol_error, Counter, Total connection protocol errors
upstream_cx_max_requests, Counter, Total connections closed due to maximum requests
upstream_cx_none_healthy, Counter, Total times connection not established due to no healthy hosts
upstream_rq_total, Counter, Total requests
upstream_rq_active, Gauge, Total active requests
upstream_rq_pending_total, Counter, Total requests pending a connection pool connection
upstream_rq_active_overflow, Counter, Total requests rejected because the max_requests circuit breaker was exhausted while attaching to a ready upstream connection (see envoy.reloadable_features.skip_pending_overflow_count_on_active_rq)
upstream_rq_pending_overflow, Counter, Total requests that overflowed connection pool or requests (mainly for HTTP/2 and above) circuit breaking and were failed
upstream_rq_pending_failure_eject, Counter, Total requests that were failed due to a connection pool connection failure or remote connection termination
upstream_rq_pending_active, Gauge, Total active requests pending a connection pool connection
upstream_rq_per_cx, Histogram, Number of requests handled per upstream connection for all HTTP protocols
upstream_rq_cancelled, Counter, Total requests cancelled before obtaining a connection pool connection
upstream_rq_maintenance_mode, Counter, Total requests that resulted in an immediate 503 due to :ref:maintenance mode<config_http_filters_router_runtime_maintenance_mode>
upstream_rq_timeout, Counter, Total requests that timed out waiting for a response
upstream_rq_max_duration_reached, Counter, Total requests closed due to max duration reached
upstream_rq_per_try_timeout, Counter, Total requests that hit the per try timeout (except when request hedging is enabled)
upstream_rq_rx_reset, Counter, Total requests that were reset remotely
upstream_rq_tx_reset, Counter, Total requests that were reset locally
upstream_rq_retry, Counter, Total request retries
upstream_rq_retry_backoff_exponential, Counter, Total retries using the exponential backoff strategy
upstream_rq_retry_backoff_ratelimited, Counter, Total retries using the ratelimited backoff strategy
upstream_rq_retry_limit_exceeded, Counter, Total requests not retried due to exceeding :ref:the configured number of maximum retries <config_http_filters_router_x-envoy-max-retries>
upstream_rq_retry_success, Counter, Total request retry successes
upstream_rq_retry_overflow, Counter, Total requests not retried due to circuit breaking or exceeding the :ref:retry budget <envoy_v3_api_field_config.cluster.v3.CircuitBreakers.Thresholds.retry_budget>
upstream_flow_control_paused_reading_total, Counter, Total number of times flow control paused reading from upstream
upstream_flow_control_resumed_reading_total, Counter, Total number of times flow control resumed reading from upstream
upstream_flow_control_backed_up_total, Counter, Total number of times the upstream connection backed up and paused reads from downstream
upstream_flow_control_drained_total, Counter, Total number of times the upstream connection drained and resumed reads from downstream
upstream_internal_redirect_failed_total, Counter, Total number of times failed internal redirects resulted in redirects being passed downstream.
upstream_internal_redirect_succeeded_total, Counter, Total number of times internal redirects resulted in a second upstream request.
membership_change, Counter, Total cluster membership changes
membership_healthy, Gauge, Current cluster healthy total (inclusive of both health checking and outlier detection)
membership_degraded, Gauge, Current cluster :ref:degraded <arch_overview_load_balancing_degraded> total
membership_excluded, Gauge, Current cluster :ref:excluded <arch_overview_load_balancing_excluded> total
membership_total, Gauge, Current cluster membership total
retry_or_shadow_abandoned, Counter, Total number of times shadowing or retry buffering was canceled due to buffer limits
config_reload, Counter, Total API fetches that resulted in a config reload due to a different config
update_attempt, Counter, Total attempted cluster membership updates by service discovery
update_success, Counter, Total successful cluster membership updates by service discovery
update_failure, Counter, Total failed cluster membership updates by service discovery
update_duration, Histogram, Amount of time in milliseconds spent updating configs
update_empty, Counter, Total cluster membership updates ending with empty cluster load assignment and continuing with previous config
update_no_rebuild, Counter, Total successful cluster membership updates that didn't result in any cluster load balancing structure rebuilds
version, Gauge, Hash of the contents from the last successful API fetch
warming_state, Gauge, Current cluster warming state
max_host_weight, Gauge, Maximum weight of any host in the cluster
bind_errors, Counter, Total errors binding the socket to the configured source address
assignment_timeout_received, Counter, Total assignments received with endpoint lease information.
assignment_stale, Counter, Number of times the received assignments went stale before new assignments arrived.
HTTP/3 protocol stats are global with the following statistics:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
upstream.<tx/rx>.quic_connection_close_error_code_<error_code>, Counter, A collection of counters that are lazily initialized to record each QUIC connection close's error code. upstream.<tx/rx>.quic_reset_stream_error_code_<error_code>, Counter, A collection of counters that are lazily initialized to record each QUIC stream reset error code.
If health check is configured, the cluster has an additional statistics tree rooted at cluster.<name>.health_check. with the following statistics:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
attempt, Counter, Number of health checks success, Counter, Number of successful health checks failure, Counter, Number of immediately failed health checks (e.g. HTTP 503) as well as network failures passive_failure, Counter, Number of health check failures due to passive events (e.g. x-envoy-immediate-health-check-fail) network_failure, Counter, Number of health check failures due to network error verify_cluster, Counter, Number of health checks that attempted cluster name verification healthy, Gauge, Number of healthy members
.. _config_cluster_manager_cluster_stats_outlier_detection:
If :ref:outlier detection <arch_overview_outlier_detection> is configured for a cluster,
statistics will be rooted at cluster.<name>.outlier_detection. and contain the following:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
ejections_enforced_total, Counter, Number of enforced ejections due to any outlier type
ejections_active, Gauge, Number of currently ejected hosts
ejections_overflow, Counter, Number of ejections aborted due to the max ejection %
ejections_enforced_consecutive_5xx, Counter, Number of enforced consecutive 5xx ejections
ejections_detected_consecutive_5xx, Counter, Number of detected consecutive 5xx ejections (even if unenforced)
ejections_enforced_success_rate, Counter, Number of enforced success rate outlier ejections. Exact meaning of this counter depends on :ref:outlier_detection.split_external_local_origin_errors<envoy_v3_api_field_config.cluster.v3.OutlierDetection.split_external_local_origin_errors> config item. Refer to :ref:Outlier Detection documentation<arch_overview_outlier_detection> for details.
ejections_detected_success_rate, Counter, Number of detected success rate outlier ejections (even if unenforced). Exact meaning of this counter depends on :ref:outlier_detection.split_external_local_origin_errors<envoy_v3_api_field_config.cluster.v3.OutlierDetection.split_external_local_origin_errors> config item. Refer to :ref:Outlier Detection documentation<arch_overview_outlier_detection> for details.
ejections_enforced_consecutive_gateway_failure, Counter, Number of enforced consecutive gateway failure ejections
ejections_detected_consecutive_gateway_failure, Counter, Number of detected consecutive gateway failure ejections (even if unenforced)
ejections_enforced_consecutive_local_origin_failure, Counter, Number of enforced consecutive local origin failure ejections
ejections_detected_consecutive_local_origin_failure, Counter, Number of detected consecutive local origin failure ejections (even if unenforced)
ejections_enforced_local_origin_success_rate, Counter, Number of enforced success rate outlier ejections for locally originated failures
ejections_detected_local_origin_success_rate, Counter, Number of detected success rate outlier ejections for locally originated failures (even if unenforced)
ejections_enforced_failure_percentage, Counter, Number of enforced failure percentage outlier ejections. Exact meaning of this counter depends on :ref:outlier_detection.split_external_local_origin_errors<envoy_v3_api_field_config.cluster.v3.OutlierDetection.split_external_local_origin_errors> config item. Refer to :ref:Outlier Detection documentation<arch_overview_outlier_detection> for details.
ejections_detected_failure_percentage, Counter, Number of detected failure percentage outlier ejections (even if unenforced). Exact meaning of this counter depends on :ref:outlier_detection.split_external_local_origin_errors<envoy_v3_api_field_config.cluster.v3.OutlierDetection.split_external_local_origin_errors> config item. Refer to :ref:Outlier Detection documentation<arch_overview_outlier_detection> for details.
ejections_enforced_failure_percentage_local_origin, Counter, Number of enforced failure percentage outlier ejections for locally originated failures
ejections_detected_failure_percentage_local_origin, Counter, Number of detected failure percentage outlier ejections for locally originated failures (even if unenforced)
ejections_total, Counter, Deprecated. Number of ejections due to any outlier type (even if unenforced)
ejections_consecutive_5xx, Counter, Deprecated. Number of consecutive 5xx ejections (even if unenforced)
.. _config_cluster_manager_cluster_stats_circuit_breakers:
Circuit breakers statistics will be rooted at cluster.<name>.circuit_breakers.<priority>. and contain the following:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
cx_open, Gauge, Whether the connection circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) cx_pool_open, Gauge, Whether the connection pool circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) rq_pending_open, Gauge, Whether the pending requests circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) rq_open, Gauge, Whether the requests circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) rq_retry_open, Gauge, Whether the retry circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) remaining_cx, Gauge, Number of remaining connections until the circuit breaker reaches its concurrency limit remaining_pending, Gauge, Number of remaining pending requests until the circuit breaker reaches its concurrency limit remaining_rq, Gauge, Number of remaining requests until the circuit breaker reaches its concurrency limit remaining_retries, Gauge, Number of remaining retries until the circuit breaker reaches its concurrency limit
.. note::
Metrics starting with prefix remaining_ are not generated by default.
To track the number of resources remaining until a circuit breaker opens, set the parameter
:ref:track_remaining <envoy_v3_api_field_config.cluster.v3.CircuitBreakers.Thresholds.track_remaining>
to true in circuit breaker configuration.
.. _config_cluster_manager_cluster_stats_timeout_budgets:
If :ref:timeout budget statistic tracking <envoy_v3_api_field_config.cluster.v3.Cluster.track_timeout_budgets> is
turned on, statistics will be added to cluster.<name> and contain the following:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
upstream_rq_timeout_budget_percent_used, Histogram, What percentage of the global timeout was used waiting for a response upstream_rq_timeout_budget_per_try_percent_used, Histogram, What percentage of the per try timeout was used waiting for a response
.. _config_cluster_manager_cluster_stats_dynamic_http:
If HTTP is used, dynamic HTTP response code statistics are also available. These are emitted by
various internal systems as well as some filters such as the :ref:router filter <config_http_filters_router> and :ref:rate limit filter <config_http_filters_rate_limit>. They
are rooted at cluster.<name>. and contain the following statistics:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
upstream_rq_completed, Counter, "Total upstream requests completed" upstream_rq_<*xx>, Counter, "Aggregate HTTP response codes (e.g., 2xx, 3xx, etc.)" upstream_rq_<*>, Counter, "Specific HTTP response codes (e.g., 201, 302, etc.)" upstream_rq_time, Histogram, Request time milliseconds canary.upstream_rq_completed, Counter, "Total upstream canary requests completed" canary.upstream_rq_<*xx>, Counter, Upstream canary aggregate HTTP response codes canary.upstream_rq_<*>, Counter, Upstream canary specific HTTP response codes canary.upstream_rq_time, Histogram, Upstream canary request time milliseconds internal.upstream_rq_completed, Counter, "Total internal origin requests completed" internal.upstream_rq_<*xx>, Counter, Internal origin aggregate HTTP response codes internal.upstream_rq_<*>, Counter, Internal origin specific HTTP response codes internal.upstream_rq_time, Histogram, Internal origin request time milliseconds external.upstream_rq_completed, Counter, "Total external origin requests completed" external.upstream_rq_<*xx>, Counter, External origin aggregate HTTP response codes external.upstream_rq_<*>, Counter, External origin specific HTTP response codes external.upstream_rq_time, Histogram, External origin request time milliseconds
.. note::
The upstream_rq_<*xx> and upstream_rq_<*> counters only count final responses
sent to the downstream client. Responses that trigger a retry are counted in
retry.upstream_rq_<*xx> and retry.upstream_rq_<*> instead (see
:ref:retry statistics <config_cluster_manager_cluster_stats_retry> below).
For example, if a request receives 503 → 503 → 200 (two retries before success):
retry.upstream_rq_503 = 2 (the two 503 responses that were retried)upstream_rq_503 = 0 (no 503 was sent downstream)upstream_rq_200 = 1 (the final successful response).. _config_cluster_manager_cluster_stats_retry:
When retries are enabled and a response triggers a retry, the following dynamic HTTP statistics
are emitted. These are rooted at cluster.<name>.retry. and track responses that were not
sent to the downstream client because they triggered a retry:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
upstream_rq_<\*xx>, Counter, "Aggregate HTTP response codes that triggered retry (e.g., 5xx)"
upstream_rq_<\*>, Counter, "Specific HTTP response codes that triggered retry (e.g., 503)"
.. note::
These counters are incremented when a response triggers a retry and is not forwarded
downstream. The corresponding upstream_rq_<*> counters (without the retry. prefix)
only count final responses that were actually sent to the client.
.. _config_cluster_manager_cluster_stats_tls:
If TLS is used by the cluster the following statistics are rooted at cluster.<name>.ssl.:
.. include:: ../../../_include/ssl_stats.rst
.. _config_cluster_manager_cluster_stats_certs:
TLS and CA certificate statistics are rooted in the cluster.<name>.ssl.certificate.<cert_name>.:
.. include:: ../../../_include/cert_stats.rst
.. _config_cluster_manager_cluster_stats_tcp:
The following TCP statistics, which are available when using the :ref:TCP stats transport socket <envoy_v3_api_msg_extensions.transport_sockets.tcp_stats.v3.Config>,
are rooted at cluster.<name>.tcp_stats.:
.. include:: ../../../_include/tcp_stats.rst
.. _config_cluster_manager_cluster_stats_alt_tree:
If alternate tree statistics are configured, they will be present in the
cluster.<name>.<alt name>. namespace. The statistics produced are the same as documented in
the dynamic HTTP statistics section :ref:above <config_cluster_manager_cluster_stats_dynamic_http>.
.. _config_cluster_manager_cluster_per_az_stats:
If the service zone is available for the local service (via :option:--service-zone)
and the :ref:upstream cluster <arch_overview_service_discovery_types_eds>,
Envoy will track the following statistics in cluster.<name>.zone.<from_zone>.<to_zone>. namespace.
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
upstream_rq_<*xx>, Counter, "Aggregate HTTP response codes (e.g., 2xx, 3xx, etc.)" upstream_rq_<*>, Counter, "Specific HTTP response codes (e.g., 201, 302, etc.)" upstream_rq_time, Histogram, Request time milliseconds
Statistics for monitoring load balancer decisions. Stats are rooted at cluster.<name>. and contain the following statistics:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
lb_recalculate_zone_structures, Counter, The number of times locality aware routing structures are regenerated for fast decisions on upstream locality selection lb_healthy_panic, Counter, Total requests load balanced with the load balancer in panic mode lb_zone_cluster_too_small, Counter, No zone aware routing because of small upstream cluster size lb_zone_routing_all_directly, Counter, Sending all requests directly to the same zone lb_zone_routing_sampled, Counter, Sending some requests to the same zone lb_zone_routing_cross_zone, Counter, Zone aware routing mode but have to send cross zone lb_local_cluster_not_ok, Counter, Local host set is not set or it is panic mode for local cluster lb_zone_no_capacity_left, Counter, Total number of times ended with random zone selection due to rounding error original_dst_host_invalid, Counter, Total number of invalid hosts passed to original destination load balancer
.. _config_cluster_manager_cluster_stats_subset_lb:
Statistics for monitoring :ref:load balancer subset <arch_overview_load_balancer_subsets>
decisions. Stats are rooted at cluster.<name>. and contain the following statistics:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
lb_subsets_active, Gauge, Number of currently available subsets
lb_subsets_created, Counter, Number of subsets created
lb_subsets_removed, Counter, Number of subsets removed due to no hosts
lb_subsets_selected, Counter, Number of times any subset was selected for load balancing
lb_subsets_fallback, Counter, Number of times the fallback policy was invoked
lb_subsets_fallback_panic, Counter, Number of times the subset panic mode triggered
lb_subsets_single_host_per_subset_duplicate, Gauge, Number of duplicate (unused) hosts when using :ref:single_host_per_subset <envoy_v3_api_field_config.cluster.v3.Cluster.LbSubsetConfig.LbSubsetSelector.single_host_per_subset>
.. _config_cluster_manager_cluster_stats_ring_hash_lb:
Statistics for monitoring the size and effective distribution of hashes when using the
:ref:ring hash load balancer <arch_overview_load_balancing_types_ring_hash>. Stats are rooted at
cluster.<name>.ring_hash_lb. and contain the following statistics:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
size, Gauge, Total number of host hashes on the ring min_hashes_per_host, Gauge, Minimum number of hashes for a single host max_hashes_per_host, Gauge, Maximum number of hashes for a single host
.. _config_cluster_manager_cluster_stats_maglev_lb:
Statistics for monitoring effective host weights when using the
:ref:Maglev load balancer <arch_overview_load_balancing_types_maglev>. Stats are rooted at
cluster.<name>.maglev_lb. and contain the following statistics:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
min_entries_per_host, Gauge, Minimum number of entries for a single host max_entries_per_host, Gauge, Maximum number of entries for a single host
.. _config_cluster_manager_cluster_stats_request_response_sizes:
If :ref:request response size statistics <envoy_v3_api_field_config.cluster.v3.Cluster.track_cluster_stats> are tracked,
statistics will be added to cluster.<name> and contain the following:
.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2
upstream_rq_headers_size, Histogram, Request headers size in bytes per upstream upstream_rq_headers_count, Histogram, Request header count per upstream upstream_rq_body_size, Histogram, Request body size in bytes per upstream upstream_rs_headers_size, Histogram, Response headers size in bytes per upstream upstream_rs_headers_count, Histogram, Response header count per upstream upstream_rs_body_size, Histogram, Response body size in bytes per upstream