Back to Yugabyte Db

YugabyteDB Grafana Dashboard

cloud/grafana/README.md

2025.2.3.0-b1063.4 KB
Original Source

YugabyteDB Grafana Dashboard

To import the YugabyteDB Grafana dashboard, please see this Grafana documentation link.

This dashboard was tested with Grafana v6.0.0 and v7.0.3.

Best Practice:

  • In cases where YugabyteDB is being manually deployed, please specify a unique value for the flag --metric_node_name for each server in order to see distinct graphs.
  • This dashboard uses the label node_prefix to separate multiple YugabyteDB clusters. Creating individual scrape jobs for each cluster will allow the graphs to be separated cleanly.

Prometheus configuration

Here is a sample Prometheus configuration with required relabel configurations.

  • Make sure you replace the IP addresses with correct IP addresses of the machines where YB-Master and YB-TServer services are running.

  • Replace cluster-1 with a desired identifier for the particular cluster.

    yaml
    global:
      scrape_interval:     5s # Set the scrape interval to every 5 seconds. Default is every 1 minute.
      evaluation_interval: 5s # Evaluate rules every 5 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    
    # YugaByte DB configuration to scrape Prometheus time-series metrics
    scrape_configs:
      - job_name: "yugabytedb-cluster-1"
        metrics_path: /prometheus-metrics
        relabel_configs:
          - target_label: "node_prefix"
            replacement: "cluster-1"
        metric_relabel_configs:
          # Save the name of the metric so we can group_by since we cannot by __name__ directly...
          - source_labels: ["__name__"]
            regex: "(.*)"
            target_label: "saved_name"
            replacement: "$1"
          # The following basically retrofit the handler_latency_* metrics to label format.
          - source_labels: ["__name__"]
            regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(.*)"
            target_label: "server_type"
            replacement: "$1"
          - source_labels: ["__name__"]
            regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(.*)"
            target_label: "service_type"
            replacement: "$2"
          - source_labels: ["__name__"]
            regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(_sum|_count)?"
            target_label: "service_method"
            replacement: "$3"
          - source_labels: ["__name__"]
            regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(_sum|_count)?"
            target_label: "__name__"
            replacement: "rpc_latency$4"
    
        static_configs:
          - targets: ["10.0.0.1:7000", "10.0.0.2:7000", "10.0.0.3:7000"]
            labels:
              group: "yb-master"
              export_type: "master_export"
    
          - targets: ["10.0.0.101:9000", "10.0.0.102:9000", "10.0.0.103:9000"]
            labels:
              group: "yb-tserver"
              export_type: "tserver_export"
    
          - targets: ["10.0.0.101:12000", "10.0.0.102:12000", "10.0.0.103:12000"]
            labels:
              group: "ycql"
              export_type: "cql_export"
    
          - targets: ["10.0.0.101:13000", "10.0.0.102:13000", "10.0.0.103:13000"]
            labels:
              group: "ysql"
              export_type: "ysql_export"
    
          - targets: ["10.0.0.101:11000", "10.0.0.102:11000", "10.0.0.103:11000"]
            labels:
              group: "yedis"
              export_type: "redis_export"