doc/administration/gitaly/praefect/monitoring.md
To monitor Gitaly Cluster (Praefect), you can use Prometheus metrics. Two separate metrics endpoints are available from which metrics can be scraped:
/metrics endpoint./db_metrics, which contains metrics that require database queries./metrics endpointThe following metrics are available from the /metrics endpoint:
gitaly_praefect_read_distribution, a counter to track distribution of reads.
It has two labels:
virtual_storage.storage.They reflect configuration defined for this instance of Praefect.
gitaly_praefect_replication_latency_bucket, a histogram measuring the amount of time it takes
for replication to complete after the replication job starts.
gitaly_praefect_replication_delay_bucket, a histogram measuring how much time passes between
when the replication job is created and when it starts.
gitaly_praefect_connections_total, the total number of connections to Praefect.
gitaly_praefect_method_types, a count of accessor and mutator RPCs per node.
To monitor strong consistency, you can use the following Prometheus metrics:
gitaly_praefect_transactions_total, the number of transactions created and voted on.gitaly_praefect_subtransactions_per_transaction_total, the number of times nodes cast a vote for
a single transaction. This can happen multiple times if multiple references are getting updated in
a single transaction.gitaly_praefect_voters_per_transaction_total: the number of Gitaly nodes taking part in a
transaction.gitaly_praefect_transactions_delay_seconds, the server-side delay introduced by waiting for the
transaction to be committed.gitaly_hook_transaction_voting_delay_seconds, the client-side delay introduced by waiting for
the transaction to be committed.To monitor repository verification, use the following Prometheus metrics:
gitaly_praefect_verification_jobs_dequeued_total, the number of verification jobs picked up by the
worker.gitaly_praefect_verification_jobs_completed_total, the number of verification jobs completed by the
worker. The result label indicates the end result of the jobs:
valid indicates the expected replica existed on the storage.invalid indicates the replica expected to exist did not exist on the storage.error indicates the job failed and has to be retried.gitaly_praefect_stale_verification_leases_released_total, the number of stale verification leases
released.You can also monitor the Praefect logs.
/db_metrics endpointThe following metrics are available from the /db_metrics endpoint:
gitaly_praefect_unavailable_repositories, the number of repositories that have no healthy, up to date replicas.gitaly_praefect_replication_queue_depth, the number of jobs in the replication queue.gitaly_praefect_verification_queue_depth, the total number of replicas pending verification.