docs/metrics/metrics.md
This document lists and describes metrics supported by FHEVM services. Intention is for it to help operators monitor these services, configure alarms based on the metrics, and act on those in case of issues.
We also recommend alarm thresholds for each metric, where applicable. Thresholds suggested are conservative and can be adjusted based on the operator's environment and requirements.
Note that recommendations assume a smoke test that runs transactions/requests at a rate of approximately 1 per 30 seconds. These include verify proofs, FHE computation, ACL updates and decryptions.
coprocessor_txn_sender_verify_proof_success_counterincrease(counter[1m]) == 0.coprocessor_txn_sender_verify_proof_fail_counterincrease(counter[1m]) > 60.coprocessor_txn_sender_add_ciphertext_material_success_counterincrease(counter[1m]) == 0.coprocessor_txn_sender_add_ciphertext_material_fail_counterincrease(counter[1m]) > 60.coprocessor_allow_handle_unsent_gaugemin_over_time(gauge[2m]) > 100.coprocessor_add_ciphertext_material_unsent_gaugemin_over_time(gauge[2m]) > 100.coprocessor_verify_proof_resp_unsent_txn_gaugemin_over_time(gauge[2m]) > 100.coprocessor_verify_proof_pending_gaugemin_over_time(gauge[2m]) > 100.coprocessor_gw_listener_verify_proof_success_counterincrease(counter[1m]) == 0.coprocessor_gw_listener_verify_proof_fail_counterincrease(counter[1m]) > 60.coprocessor_gw_listener_get_block_num_fail_counterincrease(counter[1m]) > 60.coprocessor_gw_listener_get_logs_success_countercoprocessor_gw_listener_get_logs_fail_counterincrease(counter[1m]) == 0.coprocessor_gw_listener_activate_crs_success_countercoprocessor_gw_listener_activate_crs_fail_counterincrease(counter[1m]) > 0.coprocessor_gw_listener_crs_digest_mismatch_counterincrease(counter[1m]) > 0.coprocessor_gw_listener_activate_key_success_countercoprocessor_gw_listener_activate_key_fail_counterincrease(counter[1m]) > 0.coprocessor_gw_listener_key_digest_mismatch_counterincrease(counter[1m]) > 0.coprocessor_gw_listener_drift_detected_countercoprocessor_gw_listener_consensus_timeout_countercoprocessor_gw_listener_missing_submission_countercoprocessor_gw_listener_consensus_latency_blocks--drift-no-consensus-timeout. Bucket boundaries: 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144.coprocessor_gw_listener_post_consensus_completion_blocks--drift-post-consensus-grace. Bucket boundaries: 0, 1, 2, 3, 5, 8, 13, 21, 34.Metrics for zkproof-worker are to be added in future releases, if/when needed. Currently, the transaction-sender handles ZK proof related metrics, please see its section.
coprocessor_sns_worker_task_execute_success_counterincrease(counter[1m]) == 0.coprocessor_sns_worker_task_execute_failure_counterincrease(counter[1m]) > 240.coprocessor_sns_worker_aws_upload_success_counterincrease(counter[1m]) == 0.coprocessor_sns_worker_aws_upload_failure_counterincrease(counter[1m]) > 240.coprocessor_sns_worker_uncomplete_tasks_gaugemin_over_time(gauge[2m]) > 100.coprocessor_sns_worker_uncomplete_aws_uploads_gaugemin_over_time(gauge[2m]) > 100.coprocessor_worker_errorsincrease(counter[1m]) > 240.coprocessor_work_items_pollscoprocessor_work_items_notificationsincrease(counter[1m]) == 0.coprocessor_work_items_foundincrease(counter[1m]) == 0.coprocessor_work_items_processedincrease(counter[1m]) == 0.kms_connector_gw_listener_event_received_counterevent_type: can be used to filter by event type (public_decryption_request, user_decryption_request, crsgen_request, ...).event_type public_decryption_request and user_decryption_request.
increase(counter{event_type="..."}[1m]) == 0.kms_connector_gw_listener_event_listening_errorscontract: can be used to filter by contract (decryption, kmsgeneration).sum(increase(counter[1m])) > 60.kms_connector_worker_event_received_counterevent_type: see descriptionevent_type public_decryption_request and user_decryption_request.
increase(counter{event_type="..."}[1m]) == 0.kms_connector_worker_event_received_errorsevent_type: see descriptionsum(increase(counter[1m])) > 60.kms_connector_worker_grpc_request_sent_counterevent_type: see descriptionevent_type public_decryption_request and user_decryption_request.
increase(counter{event_type="..."}[1m]) == 0.kms_connector_worker_grpc_request_sent_errorsevent_type: see descriptionsum(increase(counter[1m])) > 60.kms_connector_worker_grpc_response_polled_counterevent_type: see descriptionevent_type public_decryption_request and user_decryption_request.
increase(counter{event_type="..."}[1m]) == 0.kms_connector_worker_grpc_response_polled_errorsevent_type: see descriptionsum(increase(counter[1m])) > 60.kms_connector_worker_s3_ciphertext_retrieval_counterincrease(counter[1m]) == 0.kms_connector_worker_s3_ciphertext_retrieval_errorssum(increase(counter[1m])) > 60.kms_connector_worker_decryption_latency_secondsevent_type: see descriptionpublic_decryption_request and user_decryption_request event types. Bucket boundaries (in seconds): 0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0.kms_connector_tx_sender_response_received_counterresponse_type: can be used to filter by response type (public_decryption_response, user_decryption_response, crsgen_response, ...).response_type public_decryption_response and user_decryption_response.
increase(counter{response_type = "..."}[1m]) == 0.kms_connector_tx_sender_response_received_errorsresponse_type: see descriptionsum(increase(counter[1m])) > 60.kms_connector_tx_sender_gateway_tx_sent_counterresponse_type: see descriptionresponse_type public_decryption_response and user_decryption_response.
increase(counter{response_type = "..."}[1m]) == 0.kms_connector_tx_sender_gateway_tx_sent_errorsresponse_type: see descriptionsum(increase(counter[1m])) > 60.kms_connector_pending_eventsevent_type: see description (only available for decryption right now!)kms_connector_pending_responsesresponse_type: see description (only available for decryption right now!)kms_connector_tx_sender_response_forwarding_latency_secondsresponse_type: see description