Back to Skywalking

Kafka monitoring

docs/en/setup/backend/backend-kafka-monitoring.md

10.4.010.4 KB
Original Source

Kafka monitoring

SkyWalking leverages Prometheus JMX Exporter to collect metrics data from the Kafka and leverages OpenTelemetry Collector to transfer the metrics to OpenTelemetry receiver and into the Meter System. Kafka entity as a Service in OAP and on the Layer: KAFKA.

Data flow

  1. The prometheus_JMX_Exporter collect metrics data from Kafka. Note: Running the exporter as a Java agent.
  2. OpenTelemetry Collector fetches metrics from prometheus_JMX_Exporter via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via OpenTelemetry gRPC exporter.
  3. The SkyWalking OAP Server parses the expression with MAL to filter/calculate/aggregate and store the results.

Setup

  1. Setup prometheus_JMX_Exporter. This is an example for JMX Exporter configuration kafka-2_0_0.yml.
  2. Set up OpenTelemetry Collector. The example for OpenTelemetry Collector configuration, refer to here.
  3. Config SkyWalking OpenTelemetry receiver.

Kafka Monitoring

Kafka monitoring provides multidimensional metrics monitoring of Kafka cluster as Layer: KAFKA Service in the OAP. In each cluster, the kafka brokers are represented as Instance.

Kafka Cluster Supported Metrics

Monitoring PanelMetric NameDescriptionData Source
Under-Replicated Partitionsmeter_kafka_under_replicated_partitionsNumber of under-replicated partitions in the broker. A higher number is a sign of potential issues.Prometheus JMX Exporter
Offline Partitions Countmeter_kafka_offline_partitions_countNumber of partitions that are offline. Non-zero values indicate a problem.Prometheus JMX Exporter
Partition Countmeter_kafka_partition_countTotal number of partitions on the broker.Prometheus JMX Exporter
Leader Countmeter_kafka_leader_countNumber of leader partitions on this broker.Prometheus JMX Exporter
Active Controller Countmeter_kafka_active_controller_countThe number of active controllers in the cluster. Typically should be 1.Prometheus JMX Exporter
Leader Election Ratemeter_kafka_leader_election_rateThe rate of leader elections per minute. High rate could be a sign of instability.Prometheus JMX Exporter
Unclean Leader Elections Per Secondmeter_kafka_unclean_leader_elections_per_secondThe rate of unclean leader elections per second. Non-zero values indicate a serious problem.Prometheus JMX Exporter
Max Lagmeter_kafka_max_lagThe maximum lag between the leader and followers in terms of messages still needed to be sent. Higher lag indicates delays.Prometheus JMX Exporter

Kafka Broker Supported Metrics

Monitoring PanelUnitMetric NameDescriptionData Source
CPU Usage%meter_kafka_broker_cpu_time_totalCPU usage in percentagePrometheus JMX Exporter
Memory Usage%meter_kafka_broker_memory_usage_percentageJVM heap memory usage in percentagePrometheus JMX Exporter
Incoming MessagesMsg/secmeter_kafka_broker_messages_per_secondRate of incoming messagesPrometheus JMX Exporter
Bytes InBytes/secmeter_kafka_broker_bytes_in_per_secondRate of incoming bytesPrometheus JMX Exporter
Bytes OutBytes/secmeter_kafka_broker_bytes_out_per_secondRate of outgoing bytesPrometheus JMX Exporter
Replication Bytes InBytes/secmeter_kafka_broker_replication_bytes_in_per_secondRate of incoming bytes for replicationPrometheus JMX Exporter
Replication Bytes OutBytes/secmeter_kafka_broker_replication_bytes_out_per_secondRate of outgoing bytes for replicationPrometheus JMX Exporter
Under-Replicated PartitionsCountmeter_kafka_broker_under_replicated_partitionsNumber of under-replicated partitionsPrometheus JMX Exporter
Under Min ISR Partition CountCountmeter_kafka_broker_under_min_isr_partition_countNumber of partitions below the minimum ISR (In-Sync Replicas)Prometheus JMX Exporter
Partition CountCountmeter_kafka_broker_partition_countTotal number of partitionsPrometheus JMX Exporter
Leader CountCountmeter_kafka_broker_leader_countNumber of partitions for which this broker is the leaderPrometheus JMX Exporter
ISR ShrinksCount/secmeter_kafka_broker_isr_shrinks_per_secondRate of ISR (In-Sync Replicas) shrinkingPrometheus JMX Exporter
ISR ExpandsCount/secmeter_kafka_broker_isr_expands_per_secondRate of ISR (In-Sync Replicas) expandingPrometheus JMX Exporter
Max LagCountmeter_kafka_broker_max_lagMaximum lag between the leader and follower for a partitionPrometheus JMX Exporter
Purgatory SizeCountmeter_kafka_broker_purgatory_sizeSize of purgatory for Produce and Fetch operationsPrometheus JMX Exporter
Garbage Collector CountCount/secmeter_kafka_broker_garbage_collector_countRate of garbage collection cyclesPrometheus JMX Exporter
Requests Per SecondReq/secmeter_kafka_broker_requests_per_secondRate of requests to the brokerPrometheus JMX Exporter
Request Queue Timemsmeter_kafka_broker_request_queue_time_msAverage time a request spends in the request queuePrometheus JMX Exporter
Remote Timemsmeter_kafka_broker_remote_time_msAverage time taken for a remote operationPrometheus JMX Exporter
Response Queue Timemsmeter_kafka_broker_response_queue_time_msAverage time a response spends in the response queuePrometheus JMX Exporter
Response Send Timemsmeter_kafka_broker_response_send_time_msAverage time taken to send a responsePrometheus JMX Exporter
Network Processor Avg Idle%meter_kafka_broker_network_processor_avg_idle_percentPercentage of idle time for the network processorPrometheus JMX Exporter
Topic Messages In TotalCountmeter_kafka_broker_topic_messages_in_totalTotal number of messages per topicPrometheus JMX Exporter
Topic Bytes Out Per SecondBytes/secmeter_kafka_broker_topic_bytesout_per_secondRate of outgoing bytes per topicPrometheus JMX Exporter
Topic Bytes In Per SecondBytes/secmeter_kafka_broker_topic_bytesin_per_secondRate of incoming bytes per topicPrometheus JMX Exporter
Topic Fetch Requests Per SecondReq/secmeter_kafka_broker_topic_fetch_requests_per_secondRate of fetch requests per topicPrometheus JMX Exporter
Topic Produce Requests Per SecondReq/secmeter_kafka_broker_topic_produce_requests_per_secondRate of produce requests per topicPrometheus JMX Exporter

Customizations

You can customize your own metrics/expression/dashboard panel. The metrics definition and expression rules are found in /config/otel-rules/kafka/kafka-cluster.yaml, /config/otel-rules/kafka/kafka-node.yaml. The Kafka dashboard panel configurations are found in /config/ui-initialized-templates/kafka.

Reference

For more details on monitoring Kafka and the metrics to focus on, see the following articles: