presto-docs/src/main/sphinx/presto_cpp/metrics.rst
.. contents:: :local: :backlinks: none :depth: 1
Presto C++ workers expose various runtime metrics that can be collected and monitored when
runtime-metrics-collection-enabled is set to true. These metrics are available through the
GET /v1/info/metrics endpoint in Prometheus data format.
For information on enabling metrics collection, see :doc:features.
These metrics track the performance and queue sizes of various executors in the Presto C++ worker.
presto_cpp.driver_cpu_executor_queue_sizepresto_cpp.driver_cpu_executor_latency_mspresto_cpp.spiller_executor_queue_sizepresto_cpp.spiller_executor_latency_mspresto_cpp.http_executor_latency_msThese metrics track HTTP requests and responses in the Presto C++ worker.
presto_cpp.num_http_requestpresto_cpp.num_http_request_errorpresto_cpp.http_request_latency_mspresto_cpp.http_request_size_bytesThese metrics track HTTP client connection behavior for outbound requests.
presto_cpp.http.client.num_connections_createdpresto_cpp.http.client.connection_first_usepresto_cpp.http.client.connection_reusepresto_cpp.http.client.transaction_create_delay_msThese metrics track data exchange operations between workers.
presto_cpp.exchange_source_peak_queued_bytespresto_cpp.exchange.request.durationpresto_cpp.exchange.request.num_triespresto_cpp.exchange.request.page_sizepresto_cpp.exchange.get_data_size.durationpresto_cpp.exchange.get_data_size.num_triesThese metrics track query execution contexts and memory usage.
presto_cpp.num_query_contextspresto_cpp.memory_manager_total_bytesThese metrics track task lifecycle and execution states.
presto_cpp.num_tasks
^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_bytes_processed
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_running
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_finished
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_cancelled
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_aborted
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_failed
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_planned
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_queued
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_zombie_velox_tasks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_zombie_presto_tasks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_with_stuck_operator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_cancelled_tasks_by_stuck_driver
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_deadlock
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_tasks_manager_lock_timeout
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
These metrics track the state and execution of drivers within tasks.
presto_cpp.num_queued_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_on_thread_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_suspended_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_blocked_wait_for_consumer_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_blocked_wait_for_split_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_blocked_wait_for_producer_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_blocked_wait_for_join_build_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_blocked_wait_for_join_probe_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_blocked_wait_for_merge_join_right_side_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_blocked_wait_for_memory_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_blocked_wait_for_connector_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_blocked_yield_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.num_stuck_drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
These metrics indicate when the worker is overloaded and may reject new work.
presto_cpp.overloaded_mempresto_cpp.overloaded_cpupresto_cpp.overloadedpresto_cpp.task_planned_time_mspresto_cpp.overloaded_duration_secThese metrics track the partitioned output buffers used for shuffling data.
presto_cpp.num_partitioned_output_bufferpresto_cpp.partitioned_output_buffer_get_data_latency_mspresto_cpp.worker_runtime_uptime_secsThese metrics provide insight into OS-level resource usage by the worker process.
presto_cpp.os_user_cpu_time_micros
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.os_system_cpu_time_micros
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.os_num_soft_page_faults
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.os_num_hard_page_faults
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.os_num_voluntary_context_switches
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.os_num_forced_context_switches
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
These metrics track the performance of the Hive connector's file handle cache. The metrics include
a placeholder {} in their name which is replaced with the connector name at runtime.
presto_cpp.{connector}.hive_file_handle_cache_num_elements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.{connector}.hive_file_handle_cache_pinned_size
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.{connector}.hive_file_handle_cache_cur_size
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.{connector}.hive_file_handle_cache_num_accumulative_hits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.{connector}.hive_file_handle_cache_num_accumulative_lookups
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.{connector}.hive_file_handle_cache_num_hits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
presto_cpp.{connector}.hive_file_handle_cache_num_lookups
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
These metrics track the state of various thread pools. The metrics include a placeholder {}
in their name which is replaced with the thread pool name at runtime.
presto_cpp.{pool}.num_threadspresto_cpp.{pool}.num_active_threadspresto_cpp.{pool}.num_pending_taskspresto_cpp.{pool}.num_total_taskspresto_cpp.{pool}.max_idle_time_nsThese metrics track violations of the EventBase (event loop) threading model.
presto_cpp.exchange_io_evb_violation_countpresto_cpp.http_server_io_evb_violation_countThese metrics track the memory pushback mechanism that helps prevent out-of-memory conditions.
presto_cpp.memory_pushback_countpresto_cpp.memory_pushback_latency_mspresto_cpp.memory_pushback_reduction_bytespresto_cpp.memory_pushback_expected_reduction_bytesFor additional runtime metrics related to specific subsystems:
S3 FileSystem Metrics: When Presto C++ workers interact with S3, additional runtime metrics are collected. See the Velox S3 FileSystem documentation <https://facebookincubator.github.io/velox/monitoring/metrics.html#s3-filesystem>_.
Velox Metrics: Metrics from the underlying Velox execution engine are also available. These are prefixed with velox. instead of presto_cpp.. See the Velox metrics documentation <https://facebookincubator.github.io/velox/monitoring/metrics.html>_.
To access these metrics:
Enable metrics collection by setting runtime-metrics-collection-enabled=true in your worker configuration.
Query the metrics endpoint:
.. code-block:: bash
The response will be in Prometheus text format, suitable for scraping by Prometheus or other monitoring systems.
.. code-block:: text
presto_cpp_worker_runtime_uptime_secs{cluster="production",worker="worker-01"} 3600
presto_cpp_num_tasks_running{cluster="production",worker="worker-01"} 42
presto_cpp_memory_manager_total_bytes{cluster="production",worker="worker-01"} 8589934592
features - For information on enabling metrics collectionproperties - For worker configuration propertiesVelox Metrics Documentation <https://facebookincubator.github.io/velox/monitoring/metrics.html>_ - For metrics from the Velox execution engine