home/versioned_docs/version-1.8.0/help/alert_threshold.md
:::tip
Alarm Threshold are the core function of HertzBeat, users can configure the trigger conditions of the alarm through the threshold rules.
Threshold rules support real-time threshold and scheduled threshold, and can be applied to monitoring metrics and log data data types. Real-time thresholds can directly trigger alerts when monitoring data is collected, and scheduled thresholds support PromQL, SQL and other expressions to calculate trigger alerts within a specified time period.
Support visual page configuration or more flexible expression rule configuration, support configuring trigger times, alarm levels, notification templates, associated specified monitoring and so on. Notification templates support object nested access, which can display alarm information more flexibly.
:::
Real-time threshold means that the alarm is triggered directly when the monitoring data is collected, which is suitable for scenarios with high real-time requirements. Supports both monitoring metrics and log data types.
System Page -> Alerting -> Alert Threshold -> New Threshold -> Select Real-time Threshold -> Select Data Type (Monitoring Metrics/Log Data)
HertzBeat Page -> Alerting -> Threshold -> New Threshold -> ReadTime Threshold Rule
Configure the threshold, for example: Select the SSL certificate metric object, configure the alarm expression-triggered when the metric expired is true, that is, equals(expired,"true"), set the alarm level notification template information, etc.
Configuration item details:
__instancename__, metric value is responseTime, which is greater than 50 triggering the alert, also supports accessing object properties like ${log.attributes.hostname}Configure real-time alert rules for log data, supporting condition judgment on log content, attributes, resource information, etc.
For example, trigger an alert when 60 error logs are received within 300 seconds.
Configuration item details:
log.level, log.message, log.attributes.*, log.resource.* and other fieldsequals(log.level,"ERROR") or contains(log.attributes.hostname,"server-01")__instancename__, __alertname__, etc.${log.level}, ${log.message}, ${log.timestamp}, etc.${log.attributes.hostname}, ${log.resource.service.name}, etc.The threshold alert configuration is complete, and alerts that have been successfully triggered can be viewed in the [Alarm Center]. If you need to send alert notifications via email, WeChat, DingTalk, or Feishu, you can configure it in [Notification].
Scheduled Threshold Rules refer to rules where the system evaluates an expression (such as PromQL, SQL) at specified periodic intervals to determine whether monitoring data or log data within a given time range meets alert conditions. These rules are suitable for scenarios requiring trend analysis or aggregated data evaluation, rather than immediate reactions to single real-time data points. Supports both monitoring metrics and log data types.
Scheduled threshold rules use a dedicated expression language based on ANTLR syntax, supporting different query syntax based on data type:
Supports PromQL-style queries. For specific syntax, please refer to the official documentation of your configured time-series database regarding PromQL. The syntax includes:
Query Expressions: Used to reference monitoring data
cpu_usage
memory{__field__="field1"}
Comparison Expressions: Used to compare values against thresholds
cpu_usage > 80
memory_usage >= 90.5
response_time < 1000
Logical Expressions: Used to combine multiple conditions
cpu_usage > 80 and memory_usage > 70
disk_usage > 90 or inode_usage > 85
cpu_usage > 80 unless maintenance_mode == 1
Parenthesis Expressions: Used to control the order of evaluation
(cpu_usage > 80 or memory_usage > 90) and service_status == 1
Supports standard SQL syntax to query log data and filter data, allowing aggregated queries on log tables:
-- Query error log count
SELECT COUNT(*) as error_count
FROM hertzbeat_logs
WHERE level = 'ERROR'
AND timestamp >= NOW() - INTERVAL 5 MINUTE
-- Group by service to count errors
SELECT service_name, COUNT(*) as error_count
FROM hertzbeat_logs
WHERE level = 'ERROR'
GROUP BY service_name
HAVING COUNT(*) > 10
System Page -> Alerting -> Alert Threshold -> New Threshold -> Select Scheduled Threshold -> Select Data Type (Monitoring Metrics/Log Data)
Configure scheduled thresholds for monitoring metrics. For example: define the expression cpu_usage{instance="server1"} > 80` for a group of CPU metrics, and trigger an alert when the expression is satisfied.
Configuration Items Explained:
cpu_usage, memory{instance="server1"})>, >=, <, <=, ==, !=and, or, unless80, 90.5)300 means the rule is checked every 5 minuteswarning, critical, emergencyConfigure scheduled thresholds for log data, performing aggregated analysis and alert judgment on logs through SQL queries.
Configuration Items Explained:
SELECT COUNT(*) as error_count FROM hertzbeat_logs
WHERE level = 'ERROR' AND timestamp >= NOW() - INTERVAL 5 MINUTE
300 means the query is executed every 5 minuteswarning, critical, emergencyerror_count, service_name, etc.__alertname__, __severity__, etc.Once the threshold rules are configured, successfully triggered alerts will be displayed in the [Alert Center]. To send alert notifications via Email, WeChat, DingTalk, or Feishu, please go to [Notification Configuration] to set up the appropriate channels.