Back to Telegraf

InfiniBand Input Plugin

plugins/inputs/infiniband/README.md

1.38.34.5 KB
Original Source

InfiniBand Input Plugin

This plugin gathers statistics for all InfiniBand devices and ports on the system. These are the counters that can be found in /sys/class/infiniband/<dev>/port/<port>/counters/ and RDMA counters can be found in /sys/class/infiniband/<dev>/ports/<port>/hw_counters/

⭐ Telegraf v1.14.0 🏷️ network 💻 linux

Global configuration options <!-- @/docs/includes/plugin_config.md -->

Plugins support additional global and plugin configuration settings for tasks such as modifying metrics, tags, and fields, creating aliases, and configuring plugin ordering. See CONFIGURATION.md for more details.

Configuration

toml
# Gets counters from all InfiniBand cards and ports installed
# This plugin ONLY supports Linux
[[inputs.infiniband]]
  # no configuration

  ## Collect RDMA counters
  # gather_rdma = false

Metrics

Actual metrics depend on the InfiniBand devices, the plugin uses a simple mapping from counter -> counter value.

Information about the counters collected is provided by Nvidia.

The following fields are emitted by the plugin when selecting counters:

  • infiniband
    • tags:

      • device
      • port
    • fields:

      Infiniband Counters

      • excessive_buffer_overrun_errors (integer)
      • link_downed (integer)
      • link_error_recovery (integer)
      • local_link_integrity_errors (integer)
      • multicast_rcv_packets (integer)
      • multicast_xmit_packets (integer)
      • port_rcv_constraint_errors (integer)
      • port_rcv_data (integer)
      • port_rcv_errors (integer)
      • port_rcv_packets (integer)
      • port_rcv_remote_physical_errors (integer)
      • port_rcv_switch_relay_errors (integer)
      • port_xmit_constraint_errors (integer)
      • port_xmit_data (integer)
      • port_xmit_discards (integer)
      • port_xmit_packets (integer)
      • port_xmit_wait (integer)
      • symbol_error (integer)
      • unicast_rcv_packets (integer)
      • unicast_xmit_packets (integer)
      • VL15_dropped (integer)

      Infiniband RDMA counters

      • duplicate_request (integer)
      • implied_nak_seq_err (integer)
      • lifespan (integer)
      • local_ack_timeout_err (integer)
      • np_cnp_sent (integer)
      • np_ecn_marked_roce_packets (integer)
      • out_of_buffer (integer)
      • out_of_sequence (integer)
      • packet_seq_err (integer)
      • req_cqe_error (integer)
      • req_cqe_flush_error (integer)
      • req_remote_access_errors (integer)
      • req_remote_invalid_request (integer)
      • resp_cqe_error (integer)
      • resp_cqe_flush_error (integer)
      • resp_local_length_error (integer)
      • resp_remote_access_errors (integer)
      • rnr_nak_retry_err (integer)
      • roce_adp_retrans (integer)
      • roce_adp_retrans_to (integer)
      • roce_slow_restart (integer)
      • roce_slow_restart_cnps (integer)
      • roce_slow_restart_trans (integer)
      • rp_cnp_handled (integer)
      • rp_cnp_ignored (integer)
      • rx_atomic_requests (integer)
      • rx_icrc_encapsulated (integer)
      • rx_read_requests (integer)
      • rx_write_requests (integer)

Example Output

text
infiniband,device=mlx5_bond_0,host=hop-r640-12,port=1 port_xmit_data=85378896588i,VL15_dropped=0i,port_rcv_packets=34914071i,port_rcv_data=34600185253i,port_xmit_discards=0i,link_downed=0i,local_link_integrity_errors=0i,symbol_error=0i,link_error_recovery=0i,multicast_rcv_packets=0i,multicast_xmit_packets=0i,unicast_xmit_packets=82002535i,excessive_buffer_overrun_errors=0i,port_rcv_switch_relay_errors=0i,unicast_rcv_packets=34914071i,port_xmit_constraint_errors=0i,port_rcv_errors=0i,port_xmit_wait=0i,port_rcv_remote_physical_errors=0i,port_rcv_constraint_errors=0i,port_xmit_packets=82002535i 1737652060000000000
infiniband,device=mlx5_bond_0,host=hop-r640-12,port=1 local_ack_timeout_err=0i,lifespan=10i,out_of_buffer=0i,resp_remote_access_errors=0i,resp_local_length_error=0i,np_cnp_sent=0i,roce_slow_restart=0i,rx_read_requests=6000i,duplicate_request=0i,resp_cqe_error=0i,rx_write_requests=19000i,roce_slow_restart_cnps=0i,rx_icrc_encapsulated=0i,rnr_nak_retry_err=0i,roce_adp_retrans=0i,out_of_sequence=0i,req_remote_access_errors=0i,roce_slow_restart_trans=0i,req_remote_invalid_request=0i,req_cqe_error=0i,resp_cqe_flush_error=0i,packet_seq_err=0i,roce_adp_retrans_to=0i,np_ecn_marked_roce_packets=0i,rp_cnp_handled=0i,implied_nak_seq_err=0i,rp_cnp_ignored=0i,req_cqe_flush_error=0i,rx_atomic_requests=0i 1737652060000000000