Back to Netdata

IBM MQ Monitoring Solutions: DevOps Evaluation Guide

src/go/plugin/ibm.d/modules/mq/MQ-MONITORING.md

2.10.325.6 KB
Original Source

IBM MQ Monitoring Solutions: DevOps Evaluation Guide

This guide evaluates IBM MQ monitoring capabilities across major monitoring platforms from a DevOps perspective. The analysis focuses on operational requirements, technology choices, and practical deployment considerations.

Monitoring Solutions Evaluated

  1. Netdata - Real-time performance monitoring
  2. Datadog - Cloud-based infrastructure monitoring
  3. Dynatrace - Application performance management
  4. Splunk - Data platform with monitoring capabilities
  5. Grafana - Visualization platform (using Prometheus exporter)
  6. Zabbix - Open source enterprise monitoring
  7. Nagios - Infrastructure monitoring and alerting system
  8. CheckMK - Infrastructure monitoring and alerting platform

Solution Abbreviations

For table readability, the following abbreviations are used:

  • ND = Netdata
  • DD = Datadog
  • DT = Dynatrace
  • SP = Splunk
  • GR = Grafana
  • ZB = Zabbix
  • NG = Nagios
  • CM = CheckMK

Queue Manager Level Monitoring

Basic Operational Status

Cardinality: 1 metric set per queue manager Technology: PCF MQCMD_INQUIRE_Q_MGR or process monitoring Authority Required: +connect +inq on queue manager Configuration: None Value: Fundamental availability monitoring

MetricNDDDDTSPGRZBNGCMPCF Constant
StatusMQIA_Q_MGR_STATUS
Connection countMQIA_CONNECTION_COUNT
Command levelMQIA_COMMAND_LEVEL
PlatformMQIA_PLATFORM
UptimeCalculated
Active channels✅¹From channel count
Active listenersMQIA_ACTIVE_LISTENERS
Publish to sub msgs$SYS topics
Expired msg count$SYS topics

¹Via channel.channels metric

Resource Utilization

Cardinality: 1 metric set per queue manager Technology: $SYS topic subscriptions (Grafana/Prometheus) or REST API Authority Required: +sub on $SYS topics or REST API access Configuration: Monitor topics must be enabled (MONINT > 0) Value: Capacity planning, performance troubleshooting

MetricNDDDDTSPGRZBNGCMCollection Method
CPU usage✅*$SYS topics
Memory usage✅*$SYS topics
Log write latency$SYS topics
File system usage$SYS topics
Log bytes used✅*See log utilization
Transaction counts$SYS topics

*Framework implemented, requires MQ 9.0+ with MONINT configured and attribute ID mapping

Queue Manager Log Utilization

Cardinality: 1 metric set per queue manager Technology: PCF commands Authority Required: +inq on queue manager Configuration: None Value: Log space monitoring and capacity planning

MetricNDDDDTSPGRZBNGCMDescription
Log utilization %6 metrics for log usage

Advanced Queue Manager Attributes

Cardinality: 1 metric set per queue manager Technology: PCF MQCMD_INQUIRE_Q_MGR Authority Required: +connect +inq on queue manager Configuration: None Value: Configuration tracking and compliance

MetricNDDDDTSPGRZBNGCMPCF Constant
Distribution listsMQIA_DIST_LISTS
Max message lengthMQIA_MAX_MSG_LENGTH
Max channelsMQIA_MAX_CHANNELS
Active channelsMQIA_ACTIVE_CHANNELS

Queue Level Monitoring

Queue Configuration Attributes

Cardinality: 1 metric set per queue (potentially thousands) Technology: PCF MQCMD_INQUIRE_Q Authority Required: +inq on queues Configuration: None Value: Configuration compliance, capacity planning

MetricNDDDDTSPGRZBNGCMPCF Constant
Max depthMQIA_MAX_Q_DEPTH
Inhibit get/putMQIA_INHIBIT_GET/PUT
Backout thresholdMQIA_BACKOUT_THRESHOLD
Trigger settingsMQIA_TRIGGER_*
Service intervalMQIA_Q_SERVICE_INTERVAL
Input open optionMQIA_DEF_INPUT_OPEN_OPTION
Depth event configMQIA_Q_DEPTH_*_EVENT

Queue Runtime Status

Cardinality: 1 metric set per queue Technology: PCF MQCMD_INQUIRE_Q_STATUS Authority Required: +inq on queues Configuration: None Value: Real-time operational monitoring

MetricNDDDDTSPGRZBNGCMPCF Constant
Current depthMQIA_CURRENT_Q_DEPTH
Depth percentageCalculated
Open input countMQIA_OPEN_INPUT_COUNT
Open output countMQIA_OPEN_OUTPUT_COUNT
Oldest message ageMQIACF_OLDEST_MSG_AGE
Queue file sizeMQIACF_CUR_Q_FILE_SIZE²
Uncommitted msgsMQIACF_UNCOMMITTED_MSGS
Last GET timeTime since last GET
Last PUT timeTime since last PUT

²Requires MQ 9.1.5+

Queue Activity Rates (Destructive Operation)

Cardinality: 1 metric set per queue Technology: PCF MQCMD_RESET_Q_STATS Authority Required: +chg on queues Configuration: MONQ(MEDIUM) or MONQ(HIGH) Value: Throughput monitoring Warning: Resets counters after reading

MetricNDDDDTSPGRZBNGCMPCF Constant
Enqueue rate✅³✅³MQIA_MSG_ENQ_COUNT
Dequeue rate✅³✅³MQIA_MSG_DEQ_COUNT
High depth✅³✅³MQIA_HIGH_Q_DEPTH

³Disabled by default due to destructive nature

Queue MQI Statistics (Non-Intrusive)

Cardinality: 1 metric set per queue Technology: SYSTEM.ADMIN.STATISTICS.QUEUE subscription Authority Required: +get on SYSTEM.ADMIN.STATISTICS.QUEUE Configuration: STATQ(ON) on queue manager Value: Detailed application behavior analysis

MetricNDDDDTSPGRZBNGCMDescription
Get operations✅⁴MQGET count/bytes
Put operations✅⁴MQPUT count/bytes
Put1 operations✅⁴MQPUT1 count
Get/Put failures✅⁴Failed operations
Browse operations✅⁴Browse count/bytes
Open/Close opsMQOPEN/MQCLOSE
Expired messages✅⁴Expiration count
Purged messages✅⁴Purge count
Non-queued msgs✅⁴Direct transfers
Min/Max depth✅⁴Queue depth range
Avg queue time✅⁴Message latency

⁴Optional feature, disabled by default (collect_statistics_metrics)

Dead Letter Queue Handling

Cardinality: Backout attributes per queue Technology: MQCMD_INQUIRE_Q (configuration attributes) Authority Required: +inq on queues Configuration: Backout requeue name must be configured Value: Message poison prevention and failure handling Note: No solution provides special DLQ monitoring beyond standard queue metrics

MetricNDDDDTSPGRZBNGCMPCF Constant
Backout thresholdMQIA_BACKOUT_THRESHOLD
Harden get backoutMQIA_HARDEN_GET_BACKOUT
MQDLH parsingReason code extraction
DLQ auto-detectionSpecial queue handling
Source queue trackingFrom MQDLH header

Channel Level Monitoring

Channel Configuration

Cardinality: 1 metric set per channel (tens to hundreds) Technology: PCF MQCMD_INQUIRE_CHANNEL Authority Required: +inq on channels Configuration: None Value: Configuration compliance and tuning

MetricNDDDDTSPGRZBNGCMPCF Constant
Batch sizeMQIACH_BATCH_SIZE
Batch intervalMQIACH_BATCH_INTERVAL
Heartbeat intervalMQIACH_HB_INTERVAL
Max message lengthMQIACH_MAX_MSG_LENGTH
Retry settingsMQIACH_*_RETRY
NPM speedMQIACH_NPM_SPEED
Sharing conversationsMQIACH_SHARING_CONVERSATIONS

Channel Runtime Status

Cardinality: 1 metric set per running channel Technology: PCF MQCMD_INQUIRE_CHANNEL_STATUS Authority Required: +inq on channels Configuration: None for basic; MONCHL(MEDIUM/HIGH) for detailed Value: Connection health and performance monitoring

MetricNDDDDTSPGRZBNGCMPCF Constant
StatusMQIACH_CHANNEL_STATUS
Status summaryAggregated by status
MessagesMQIACH_MSGS
Bytes sent/rcvdMQIACH_BYTES_*
BatchesMQIACH_BATCHES
Current convsSharing conversations
SSL key resetsMQIACH_SSL_KEY_RESETS
Connection statusMQIACH_CONNS
Active connectionsChannel instances
Channel instances3 instance metrics
In-doubt statusMQIACH_INDOUBT_STATUS
XMITQ time shortMQIACH_XMITQ_TIME_SHORT
XMITQ time longMQIACH_XMITQ_TIME_LONG
Network timeChannel timing metrics
Exit timeChannel timing metrics
Total timeChannel timing metrics

Channel Statistics (Non-Intrusive)

Cardinality: 1 metric set per channel Technology: SYSTEM.ADMIN.STATISTICS.QUEUE subscription Authority Required: +get on SYSTEM.ADMIN.STATISTICS.QUEUE Configuration: STATQ(ON) and STATCHL(MEDIUM/HIGH) Value: Detailed performance analysis

MetricNDDDDTSPGRZBNGCMDescription
Messages/Bytes✅⁴Transfer metrics
Put retries✅⁴Retry counts
Batch metrics✅⁴Full/partial batches

⁴Optional feature, disabled by default

Topic and Subscription Monitoring

Topic Status

Cardinality: 1 metric set per topic Technology: PCF MQCMD_INQUIRE_TOPIC_STATUS Authority Required: +inq on topics Configuration: Pub/Sub enabled Value: Pub/Sub health monitoring

MetricNDDDDTSPGRZBNGCMPCF Constant
Publisher count⚠️MQIA_PUB_COUNT
Subscriber count⚠️MQIA_SUB_COUNT
Published msgs⚠️Via status

⚠️ Documentation claims support but no implementation found

Subscription Status

Cardinality: 1 metric set per subscription (can be very high) Technology: PCF MQCMD_INQUIRE_SUB_STATUS Authority Required: +inq on subscriptions Configuration: Pub/Sub enabled Value: Subscription health and message backlog monitoring

MetricNDDDDTSPGRZBNGCMPCF Constant
Message count⚠️MQIACF_MESSAGE_COUNT
Last message time⚠️MQCACF_LAST_MSG_TIME

⚠️ Documentation claims support but no implementation found

Event Monitoring

Event Queue Subscriptions

Cardinality: Events as they occur Technology: Subscribe to SYSTEM.ADMIN.*.EVENT queues Authority Required: +get on event queues Configuration: Enable specific event types on queue manager Value: Real-time alerting for security and operational events

Event TypeNDDDDTSPGRZBNGCMEvent Queue
Authority failuresQMGR.EVENT
Queue depth eventsQMGR.EVENT
Channel eventsCHANNEL.EVENT
Performance eventsPERFM.EVENT

Listener Monitoring

Listener Status

Cardinality: 1 metric set per listener (typically 1-5) Technology: PCF MQCMD_INQUIRE_LISTENER_STATUS Authority Required: +inq on listeners Configuration: None Value: Network endpoint availability

MetricNDDDDTSPGRZBNGCM
Status
Port/IP
Backlog

z/OS Specific Monitoring

z/OS Usage Metrics

Cardinality: 1 metric set per queue manager Technology: z/OS specific PCF commands Authority Required: +inq on queue manager Configuration: z/OS platform only Value: z/OS resource usage and performance

MetricNDDDDTSPGRZBNGCMDescription
z/OS CPU usage8 z/OS metrics
z/OS memory usagez/OS specific
z/OS pagingz/OS specific

Cluster Monitoring

Cluster Queue Manager Status

Cardinality: 1 metric set per cluster queue manager Technology: PCF MQCMD_INQUIRE_CLUSTER_Q_MGR Authority Required: +inq on cluster Configuration: Cluster must be configured Value: Cluster health and topology monitoring

MetricNDDDDTSPGRZBNGCM
Cluster suspend state
Cluster QM status
Cluster QM availability

Auto-Discovery and Configuration Management

MQ Server Discovery and Instance Management

Technology: Various (network discovery, configuration files, API calls)
Value: Automated setup and dynamic environment adaptation

CapabilityNDDDDTSPGRZBNGCMDescription
Auto-discover MQ serversNetwork discovery of MQ installations
Monitor all auto-discovered Queue ManagersMonitor all QMs found via discovery
Monitor specific Queue ManagersTarget specific QM instances
Monitor all auto-discovered Queues✅⁶✅⁷Auto-discover and monitor all queues
Monitor specific QueuesTarget specific queue instances
Monitor all auto-discovered Channels✅⁶✅⁷Auto-discover and monitor all channels
Monitor specific ChannelsTarget specific channel instances
Monitor all auto-discovered ListenersAuto-discover and monitor all listeners
Monitor specific ListenersTarget specific listener instances
Monitor all auto-discovered TopicsAuto-discover and monitor all topics
Monitor specific Topics⚠️Target specific topic instances
Monitor all auto-discovered SubscriptionsAuto-discover and monitor all subscriptions
Monitor specific Subscriptions⚠️Target specific subscription instances

⁶Datadog: auto_discover_queues with queue_patterns/queue_regex filtering
⁷Zabbix: Custom auto-discovery templates (e.g., spectroman/ibm-mq-monitoring)

Configuration Management Approaches

Netdata: Network discovery of MQ servers, but requires explicit configuration per queue manager. Supports regex-based filtering for all object types within configured queue managers.

Datadog: Manual server configuration, automatic queue/channel discovery with auto_discover_queues, queue_patterns, and queue_regex for performance optimization. No topic/subscription auto-discovery.

Dynatrace: Automatic IBM MQ process detection via OneAgent, bulk inquiry options, regex-based queue manager filtering. Manual configuration for queue/channel specifics.

Splunk: Manual configuration with IBM WebSphere MQ Modular Input. Supports multiple queues/channels per data input with regex filtering.

Grafana/Prometheus: Manual server configuration, regex-based filtering for <queues>, <channels>, <topics>, <subscriptions> in mq_prometheus exporter configuration.

Zabbix: Manual setup with third-party auto-discovery templates (spectroman/ibm-mq-monitoring) that dynamically create templates for discovered queue managers and objects.

Nagios: Manual configuration per check, supports multiple queues per check with configuration files. Advanced plugins support regex filtering for queue discovery.

CheckMK: Uses dspmq command output for queue manager discovery. Manual configuration for queue-specific monitoring. Limited auto-discovery capabilities.

Collection Characteristics

Update Frequency

SolutionMinimumTypicalMaximumArchitecture
ND1s1s60sEdge (per-node)
DD15s15-30s300sCentralized agent
DT60s60s300sCentralized agent
SP10s60s300sCentralized
GR15s30-60s300sPull-based
ZB30s60-300s3600sCentralized
NG60s300s3600sCentralized
CM60s60-300s3600sCentralized

Platform Support

SolutionCollector PlatformRemote SupportArchitecture
NDLinux, Windows, macOS✅ Client modeCGO-based
DDLinux, Windows, macOS✅ Client modePython pymqi
DTCross-platform✅ OneAgentProprietary
SPCross-platform✅ Client modeJava-based
GRLinux, Windows, AIX✅ Client modeGo client
ZBLinux only❌ Local onlyShell scripts
NGLinux, Unix, Windows✅ Client modeShell/Perl scripts
CMLinux, Solaris, AIX, HPUX✅ Client modepassword-less root ssh

Key Technical Differentiators

Netdata: Highest resolution (1s), complete MQI statistics implementation, Linux-only collector

Datadog: Python-based flexibility, endianness handling for AIX/IBM i, significant gap between documentation and implementation

Dynatrace: Commercial APM integration, z/OS specific metrics, channel timing analysis, topology mapping for transaction tracing

Splunk: Event queue monitoring capability, Java-based cross-platform support

Grafana/Prometheus: $SYS topic subscriptions for resource metrics, multiple export formats, comprehensive z/OS support, cluster monitoring

Zabbix: Local monitoring only, requires sudo access, minimal metric coverage

Nagios: Basic health checks focused on availability monitoring, simple alerting model with command-based checks. Supports queue manager status (qmmon), queue depth (qdmon), channel status (chlmon), message age (mamon), port monitoring (portmon/portcon), process monitoring (brkmon/cmdmon), and log monitoring (fdcmon/refmon) functions

CheckMK: Command-line tool based monitoring using native MQ utilities (dspmq, runmqsc). Supports queue manager availability monitoring, basic queue metrics (depth, message age, last GET/PUT times, open counts, queue time), and channel status monitoring with CRITICAL alerts for STOPPED channels. Limited to metrics available through standard MQ command-line tools.

Service Health Checks

Technology: Various (connection tests, PCF queries, process checks) Value: Proactive alerting and availability monitoring

Check TypeNDDDDTSPGRZBNGCMDescription
Connection checkCan connect to QM
Queue manager checkQM responding
Queue availabilityQueue accessible
Channel healthChannel status check
Channel status alerts✅⁵Status-based alerts

⁵Datadog provides specific service checks with WARNING/CRITICAL based on channel state

Operational Considerations

  1. Authority Requirements: All solutions require similar MQ authorities. Event monitoring requires additional permissions. Datadog requires +chg for reset statistics.

  2. Configuration Impact: Statistics collection (STATQ) has minimal performance impact. Reset operations are destructive.

  3. Cardinality Management: Queue and subscription metrics can create high cardinality. Plan retention accordingly.

  4. Platform Limitations: Only Dynatrace, Splunk, and Grafana offer true cross-platform collectors.

  5. Resolution vs Volume: Higher resolution (Netdata) provides better anomaly detection but increases storage requirements.

  6. Endianness: Datadog provides convert_endianness option for AIX/IBM i platforms.

  7. $SYS Topics: Grafana/Prometheus requires MONINT > 0 on queue manager to enable $SYS topic publications for resource metrics.