Back to Envoy

Statistics

docs/root/configuration/upstream/cluster_manager/cluster_stats.rst

1.38.025.8 KB
Original Source

.. _config_cluster_manager_cluster_stats:

Statistics

.. contents:: :local:

General

The cluster manager has a statistics tree rooted at cluster_manager. with the following statistics. Any : character in the stats name is replaced with _. Stats include all clusters managed by the cluster manager, including both clusters used for data plane upstreams and control plane xDS clusters.

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

cluster_added, Counter, Total clusters added (either via static config or CDS) cluster_modified, Counter, Total clusters modified (via CDS) cluster_removed, Counter, Total clusters removed (via CDS) cluster_updated, Counter, Total cluster updates cluster_updated_via_merge, Counter, Total cluster updates applied as merged updates update_merge_cancelled, Counter, Total merged updates that got cancelled and delivered early update_out_of_merge_window, Counter, Total updates which arrived out of a merge window active_clusters, Gauge, Number of currently active (warmed) clusters warming_clusters, Gauge, Number of currently warming (not active) clusters

In addition to the cluster manager stats, there are per worker thread local cluster manager statistics tree rooted at thread_local_cluster_manager.<worker_id>. with the following statistics.

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

clusters_inflated, Gauge, Number of clusters the worker has initialized. If using cluster deferral this number should be <= (cluster_added - clusters_removed).

.. _config_cluster_stats:

Every cluster has a statistics tree rooted at cluster.<name>. with the following statistics:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

upstream_cx_total, Counter, Total connections upstream_cx_active, Gauge, Total active connections upstream_cx_http1_total, Counter, Total HTTP/1.1 connections upstream_cx_http2_total, Counter, Total HTTP/2 connections upstream_cx_http3_total, Counter, Total HTTP/3 connections upstream_cx_connect_fail, Counter, Total connection failures upstream_cx_connect_timeout, Counter, Total connection connect timeouts upstream_cx_connect_with_0_rtt, Counter, Total connections able to send 0-rtt requests (early data). upstream_cx_idle_timeout, Counter, Total connection idle timeouts upstream_cx_max_duration_reached, Counter, Total connections closed due to max duration reached upstream_cx_connect_attempts_exceeded, Counter, Total consecutive connection failures exceeding configured connection attempts upstream_cx_overflow, Counter, Total times that the cluster's connection circuit breaker overflowed upstream_cx_connect_ms, Histogram, Connection establishment milliseconds upstream_cx_length_ms, Histogram, Connection length milliseconds upstream_cx_destroy, Counter, Total destroyed connections upstream_cx_destroy_local, Counter, Total connections destroyed locally upstream_cx_destroy_remote, Counter, Total connections destroyed remotely upstream_cx_destroy_with_active_rq, Counter, Total connections destroyed with 1+ active request upstream_cx_destroy_local_with_active_rq, Counter, Total connections destroyed locally with 1+ active request upstream_cx_destroy_remote_with_active_rq, Counter, Total connections destroyed remotely with 1+ active request upstream_cx_close_notify, Counter, Total connections closed via HTTP/1.1 connection close header or HTTP/2 or HTTP/3 GOAWAY upstream_cx_rx_bytes_total, Counter, Total received connection bytes upstream_cx_rx_bytes_buffered, Gauge, Received connection bytes currently buffered upstream_cx_tx_bytes_total, Counter, Total sent connection bytes upstream_cx_tx_bytes_buffered, Gauge, Send connection bytes currently buffered upstream_cx_pool_overflow, Counter, Total times that the cluster's connection pool circuit breaker overflowed upstream_cx_protocol_error, Counter, Total connection protocol errors upstream_cx_max_requests, Counter, Total connections closed due to maximum requests upstream_cx_none_healthy, Counter, Total times connection not established due to no healthy hosts upstream_rq_total, Counter, Total requests upstream_rq_active, Gauge, Total active requests upstream_rq_pending_total, Counter, Total requests pending a connection pool connection upstream_rq_active_overflow, Counter, Total requests rejected because the max_requests circuit breaker was exhausted while attaching to a ready upstream connection (see envoy.reloadable_features.skip_pending_overflow_count_on_active_rq) upstream_rq_pending_overflow, Counter, Total requests that overflowed connection pool or requests (mainly for HTTP/2 and above) circuit breaking and were failed upstream_rq_pending_failure_eject, Counter, Total requests that were failed due to a connection pool connection failure or remote connection termination upstream_rq_pending_active, Gauge, Total active requests pending a connection pool connection upstream_rq_per_cx, Histogram, Number of requests handled per upstream connection for all HTTP protocols upstream_rq_cancelled, Counter, Total requests cancelled before obtaining a connection pool connection upstream_rq_maintenance_mode, Counter, Total requests that resulted in an immediate 503 due to :ref:maintenance mode<config_http_filters_router_runtime_maintenance_mode> upstream_rq_timeout, Counter, Total requests that timed out waiting for a response upstream_rq_max_duration_reached, Counter, Total requests closed due to max duration reached upstream_rq_per_try_timeout, Counter, Total requests that hit the per try timeout (except when request hedging is enabled) upstream_rq_rx_reset, Counter, Total requests that were reset remotely upstream_rq_tx_reset, Counter, Total requests that were reset locally upstream_rq_retry, Counter, Total request retries upstream_rq_retry_backoff_exponential, Counter, Total retries using the exponential backoff strategy upstream_rq_retry_backoff_ratelimited, Counter, Total retries using the ratelimited backoff strategy upstream_rq_retry_limit_exceeded, Counter, Total requests not retried due to exceeding :ref:the configured number of maximum retries <config_http_filters_router_x-envoy-max-retries> upstream_rq_retry_success, Counter, Total request retry successes upstream_rq_retry_overflow, Counter, Total requests not retried due to circuit breaking or exceeding the :ref:retry budget <envoy_v3_api_field_config.cluster.v3.CircuitBreakers.Thresholds.retry_budget> upstream_flow_control_paused_reading_total, Counter, Total number of times flow control paused reading from upstream upstream_flow_control_resumed_reading_total, Counter, Total number of times flow control resumed reading from upstream upstream_flow_control_backed_up_total, Counter, Total number of times the upstream connection backed up and paused reads from downstream upstream_flow_control_drained_total, Counter, Total number of times the upstream connection drained and resumed reads from downstream upstream_internal_redirect_failed_total, Counter, Total number of times failed internal redirects resulted in redirects being passed downstream. upstream_internal_redirect_succeeded_total, Counter, Total number of times internal redirects resulted in a second upstream request. membership_change, Counter, Total cluster membership changes membership_healthy, Gauge, Current cluster healthy total (inclusive of both health checking and outlier detection) membership_degraded, Gauge, Current cluster :ref:degraded <arch_overview_load_balancing_degraded> total membership_excluded, Gauge, Current cluster :ref:excluded <arch_overview_load_balancing_excluded> total membership_total, Gauge, Current cluster membership total retry_or_shadow_abandoned, Counter, Total number of times shadowing or retry buffering was canceled due to buffer limits config_reload, Counter, Total API fetches that resulted in a config reload due to a different config update_attempt, Counter, Total attempted cluster membership updates by service discovery update_success, Counter, Total successful cluster membership updates by service discovery update_failure, Counter, Total failed cluster membership updates by service discovery update_duration, Histogram, Amount of time in milliseconds spent updating configs update_empty, Counter, Total cluster membership updates ending with empty cluster load assignment and continuing with previous config update_no_rebuild, Counter, Total successful cluster membership updates that didn't result in any cluster load balancing structure rebuilds version, Gauge, Hash of the contents from the last successful API fetch warming_state, Gauge, Current cluster warming state max_host_weight, Gauge, Maximum weight of any host in the cluster bind_errors, Counter, Total errors binding the socket to the configured source address assignment_timeout_received, Counter, Total assignments received with endpoint lease information. assignment_stale, Counter, Number of times the received assignments went stale before new assignments arrived.

HTTP/3 protocol statistics

HTTP/3 protocol stats are global with the following statistics:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

upstream.<tx/rx>.quic_connection_close_error_code_<error_code>, Counter, A collection of counters that are lazily initialized to record each QUIC connection close's error code. upstream.<tx/rx>.quic_reset_stream_error_code_<error_code>, Counter, A collection of counters that are lazily initialized to record each QUIC stream reset error code.

Health check statistics

If health check is configured, the cluster has an additional statistics tree rooted at cluster.<name>.health_check. with the following statistics:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

attempt, Counter, Number of health checks success, Counter, Number of successful health checks failure, Counter, Number of immediately failed health checks (e.g. HTTP 503) as well as network failures passive_failure, Counter, Number of health check failures due to passive events (e.g. x-envoy-immediate-health-check-fail) network_failure, Counter, Number of health check failures due to network error verify_cluster, Counter, Number of health checks that attempted cluster name verification healthy, Gauge, Number of healthy members

.. _config_cluster_manager_cluster_stats_outlier_detection:

Outlier detection statistics

If :ref:outlier detection <arch_overview_outlier_detection> is configured for a cluster, statistics will be rooted at cluster.<name>.outlier_detection. and contain the following:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

ejections_enforced_total, Counter, Number of enforced ejections due to any outlier type ejections_active, Gauge, Number of currently ejected hosts ejections_overflow, Counter, Number of ejections aborted due to the max ejection % ejections_enforced_consecutive_5xx, Counter, Number of enforced consecutive 5xx ejections ejections_detected_consecutive_5xx, Counter, Number of detected consecutive 5xx ejections (even if unenforced) ejections_enforced_success_rate, Counter, Number of enforced success rate outlier ejections. Exact meaning of this counter depends on :ref:outlier_detection.split_external_local_origin_errors<envoy_v3_api_field_config.cluster.v3.OutlierDetection.split_external_local_origin_errors> config item. Refer to :ref:Outlier Detection documentation<arch_overview_outlier_detection> for details. ejections_detected_success_rate, Counter, Number of detected success rate outlier ejections (even if unenforced). Exact meaning of this counter depends on :ref:outlier_detection.split_external_local_origin_errors<envoy_v3_api_field_config.cluster.v3.OutlierDetection.split_external_local_origin_errors> config item. Refer to :ref:Outlier Detection documentation<arch_overview_outlier_detection> for details. ejections_enforced_consecutive_gateway_failure, Counter, Number of enforced consecutive gateway failure ejections ejections_detected_consecutive_gateway_failure, Counter, Number of detected consecutive gateway failure ejections (even if unenforced) ejections_enforced_consecutive_local_origin_failure, Counter, Number of enforced consecutive local origin failure ejections ejections_detected_consecutive_local_origin_failure, Counter, Number of detected consecutive local origin failure ejections (even if unenforced) ejections_enforced_local_origin_success_rate, Counter, Number of enforced success rate outlier ejections for locally originated failures ejections_detected_local_origin_success_rate, Counter, Number of detected success rate outlier ejections for locally originated failures (even if unenforced) ejections_enforced_failure_percentage, Counter, Number of enforced failure percentage outlier ejections. Exact meaning of this counter depends on :ref:outlier_detection.split_external_local_origin_errors<envoy_v3_api_field_config.cluster.v3.OutlierDetection.split_external_local_origin_errors> config item. Refer to :ref:Outlier Detection documentation<arch_overview_outlier_detection> for details. ejections_detected_failure_percentage, Counter, Number of detected failure percentage outlier ejections (even if unenforced). Exact meaning of this counter depends on :ref:outlier_detection.split_external_local_origin_errors<envoy_v3_api_field_config.cluster.v3.OutlierDetection.split_external_local_origin_errors> config item. Refer to :ref:Outlier Detection documentation<arch_overview_outlier_detection> for details. ejections_enforced_failure_percentage_local_origin, Counter, Number of enforced failure percentage outlier ejections for locally originated failures ejections_detected_failure_percentage_local_origin, Counter, Number of detected failure percentage outlier ejections for locally originated failures (even if unenforced) ejections_total, Counter, Deprecated. Number of ejections due to any outlier type (even if unenforced) ejections_consecutive_5xx, Counter, Deprecated. Number of consecutive 5xx ejections (even if unenforced)

.. _config_cluster_manager_cluster_stats_circuit_breakers:

Circuit breakers statistics

Circuit breakers statistics will be rooted at cluster.<name>.circuit_breakers.<priority>. and contain the following:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

cx_open, Gauge, Whether the connection circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) cx_pool_open, Gauge, Whether the connection pool circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) rq_pending_open, Gauge, Whether the pending requests circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) rq_open, Gauge, Whether the requests circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) rq_retry_open, Gauge, Whether the retry circuit breaker is under its concurrency limit (0) or is at capacity and no longer admitting (1) remaining_cx, Gauge, Number of remaining connections until the circuit breaker reaches its concurrency limit remaining_pending, Gauge, Number of remaining pending requests until the circuit breaker reaches its concurrency limit remaining_rq, Gauge, Number of remaining requests until the circuit breaker reaches its concurrency limit remaining_retries, Gauge, Number of remaining retries until the circuit breaker reaches its concurrency limit

.. note:: Metrics starting with prefix remaining_ are not generated by default. To track the number of resources remaining until a circuit breaker opens, set the parameter :ref:track_remaining <envoy_v3_api_field_config.cluster.v3.CircuitBreakers.Thresholds.track_remaining> to true in circuit breaker configuration.

.. _config_cluster_manager_cluster_stats_timeout_budgets:

Timeout budget statistics

If :ref:timeout budget statistic tracking <envoy_v3_api_field_config.cluster.v3.Cluster.track_timeout_budgets> is turned on, statistics will be added to cluster.<name> and contain the following:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

upstream_rq_timeout_budget_percent_used, Histogram, What percentage of the global timeout was used waiting for a response upstream_rq_timeout_budget_per_try_percent_used, Histogram, What percentage of the per try timeout was used waiting for a response

.. _config_cluster_manager_cluster_stats_dynamic_http:

Dynamic HTTP statistics

If HTTP is used, dynamic HTTP response code statistics are also available. These are emitted by various internal systems as well as some filters such as the :ref:router filter <config_http_filters_router> and :ref:rate limit filter <config_http_filters_rate_limit>. They are rooted at cluster.<name>. and contain the following statistics:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

upstream_rq_completed, Counter, "Total upstream requests completed" upstream_rq_<*xx>, Counter, "Aggregate HTTP response codes (e.g., 2xx, 3xx, etc.)" upstream_rq_<*>, Counter, "Specific HTTP response codes (e.g., 201, 302, etc.)" upstream_rq_time, Histogram, Request time milliseconds canary.upstream_rq_completed, Counter, "Total upstream canary requests completed" canary.upstream_rq_<*xx>, Counter, Upstream canary aggregate HTTP response codes canary.upstream_rq_<*>, Counter, Upstream canary specific HTTP response codes canary.upstream_rq_time, Histogram, Upstream canary request time milliseconds internal.upstream_rq_completed, Counter, "Total internal origin requests completed" internal.upstream_rq_<*xx>, Counter, Internal origin aggregate HTTP response codes internal.upstream_rq_<*>, Counter, Internal origin specific HTTP response codes internal.upstream_rq_time, Histogram, Internal origin request time milliseconds external.upstream_rq_completed, Counter, "Total external origin requests completed" external.upstream_rq_<*xx>, Counter, External origin aggregate HTTP response codes external.upstream_rq_<*>, Counter, External origin specific HTTP response codes external.upstream_rq_time, Histogram, External origin request time milliseconds

.. note:: The upstream_rq_<*xx> and upstream_rq_<*> counters only count final responses sent to the downstream client. Responses that trigger a retry are counted in retry.upstream_rq_<*xx> and retry.upstream_rq_<*> instead (see :ref:retry statistics <config_cluster_manager_cluster_stats_retry> below).

For example, if a request receives 503503200 (two retries before success):

  • retry.upstream_rq_503 = 2 (the two 503 responses that were retried)
  • upstream_rq_503 = 0 (no 503 was sent downstream)
  • upstream_rq_200 = 1 (the final successful response)

.. _config_cluster_manager_cluster_stats_retry:

Retry statistics

When retries are enabled and a response triggers a retry, the following dynamic HTTP statistics are emitted. These are rooted at cluster.<name>.retry. and track responses that were not sent to the downstream client because they triggered a retry:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

upstream_rq_<\*xx>, Counter, "Aggregate HTTP response codes that triggered retry (e.g., 5xx)" upstream_rq_<\*>, Counter, "Specific HTTP response codes that triggered retry (e.g., 503)"

.. note:: These counters are incremented when a response triggers a retry and is not forwarded downstream. The corresponding upstream_rq_<*> counters (without the retry. prefix) only count final responses that were actually sent to the client.

.. _config_cluster_manager_cluster_stats_tls:

TLS statistics

If TLS is used by the cluster the following statistics are rooted at cluster.<name>.ssl.:

.. include:: ../../../_include/ssl_stats.rst

.. _config_cluster_manager_cluster_stats_certs:

TLS and CA certificates

TLS and CA certificate statistics are rooted in the cluster.<name>.ssl.certificate.<cert_name>.:

.. include:: ../../../_include/cert_stats.rst

.. _config_cluster_manager_cluster_stats_tcp:

TCP statistics

The following TCP statistics, which are available when using the :ref:TCP stats transport socket <envoy_v3_api_msg_extensions.transport_sockets.tcp_stats.v3.Config>, are rooted at cluster.<name>.tcp_stats.:

.. include:: ../../../_include/tcp_stats.rst

.. _config_cluster_manager_cluster_stats_alt_tree:

Alternate tree dynamic HTTP statistics

If alternate tree statistics are configured, they will be present in the cluster.<name>.<alt name>. namespace. The statistics produced are the same as documented in the dynamic HTTP statistics section :ref:above <config_cluster_manager_cluster_stats_dynamic_http>.

.. _config_cluster_manager_cluster_per_az_stats:

Per service zone dynamic HTTP statistics

If the service zone is available for the local service (via :option:--service-zone) and the :ref:upstream cluster <arch_overview_service_discovery_types_eds>, Envoy will track the following statistics in cluster.<name>.zone.<from_zone>.<to_zone>. namespace.

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

upstream_rq_<*xx>, Counter, "Aggregate HTTP response codes (e.g., 2xx, 3xx, etc.)" upstream_rq_<*>, Counter, "Specific HTTP response codes (e.g., 201, 302, etc.)" upstream_rq_time, Histogram, Request time milliseconds

Load balancer statistics

Statistics for monitoring load balancer decisions. Stats are rooted at cluster.<name>. and contain the following statistics:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

lb_recalculate_zone_structures, Counter, The number of times locality aware routing structures are regenerated for fast decisions on upstream locality selection lb_healthy_panic, Counter, Total requests load balanced with the load balancer in panic mode lb_zone_cluster_too_small, Counter, No zone aware routing because of small upstream cluster size lb_zone_routing_all_directly, Counter, Sending all requests directly to the same zone lb_zone_routing_sampled, Counter, Sending some requests to the same zone lb_zone_routing_cross_zone, Counter, Zone aware routing mode but have to send cross zone lb_local_cluster_not_ok, Counter, Local host set is not set or it is panic mode for local cluster lb_zone_no_capacity_left, Counter, Total number of times ended with random zone selection due to rounding error original_dst_host_invalid, Counter, Total number of invalid hosts passed to original destination load balancer

.. _config_cluster_manager_cluster_stats_subset_lb:

Load balancer subset statistics

Statistics for monitoring :ref:load balancer subset <arch_overview_load_balancer_subsets> decisions. Stats are rooted at cluster.<name>. and contain the following statistics:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

lb_subsets_active, Gauge, Number of currently available subsets lb_subsets_created, Counter, Number of subsets created lb_subsets_removed, Counter, Number of subsets removed due to no hosts lb_subsets_selected, Counter, Number of times any subset was selected for load balancing lb_subsets_fallback, Counter, Number of times the fallback policy was invoked lb_subsets_fallback_panic, Counter, Number of times the subset panic mode triggered lb_subsets_single_host_per_subset_duplicate, Gauge, Number of duplicate (unused) hosts when using :ref:single_host_per_subset <envoy_v3_api_field_config.cluster.v3.Cluster.LbSubsetConfig.LbSubsetSelector.single_host_per_subset>

.. _config_cluster_manager_cluster_stats_ring_hash_lb:

Ring hash load balancer statistics

Statistics for monitoring the size and effective distribution of hashes when using the :ref:ring hash load balancer <arch_overview_load_balancing_types_ring_hash>. Stats are rooted at cluster.<name>.ring_hash_lb. and contain the following statistics:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

size, Gauge, Total number of host hashes on the ring min_hashes_per_host, Gauge, Minimum number of hashes for a single host max_hashes_per_host, Gauge, Maximum number of hashes for a single host

.. _config_cluster_manager_cluster_stats_maglev_lb:

Maglev load balancer statistics

Statistics for monitoring effective host weights when using the :ref:Maglev load balancer <arch_overview_load_balancing_types_maglev>. Stats are rooted at cluster.<name>.maglev_lb. and contain the following statistics:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

min_entries_per_host, Gauge, Minimum number of entries for a single host max_entries_per_host, Gauge, Maximum number of entries for a single host

.. _config_cluster_manager_cluster_stats_request_response_sizes:

Request Response Size statistics

If :ref:request response size statistics <envoy_v3_api_field_config.cluster.v3.Cluster.track_cluster_stats> are tracked, statistics will be added to cluster.<name> and contain the following:

.. csv-table:: :header: Name, Type, Description :widths: 1, 1, 2

upstream_rq_headers_size, Histogram, Request headers size in bytes per upstream upstream_rq_headers_count, Histogram, Request header count per upstream upstream_rq_body_size, Histogram, Request body size in bytes per upstream upstream_rs_headers_size, Histogram, Response headers size in bytes per upstream upstream_rs_headers_count, Histogram, Response header count per upstream upstream_rs_body_size, Histogram, Response body size in bytes per upstream