src/health/notifications/README.md
Netdata's Agent can send alert notifications directly from each node. It supports a wide range of services, multiple recipients, and role-based routing.
The Agent uses a notification script defined in netdata.conf under the [health] section:
script to execute on alarm = /usr/libexec/netdata/plugins.d/alarm-notify.sh
The default script is alarm-notify.sh.
This script handles:
sysadmin, webmaster, dba)flowchart TD
Alert("High CPU Usage Alert") --> Check("Severity Level")
Check -->|"WARNING"| WarningRouting("Role: SysAdmin")
Check -->|"CRITICAL"| CriticalRouting("Multiple Roles")
WarningRouting --> SlackChannel("Slack")
WarningRouting --> EmailOps("Email")
CriticalRouting --> PagerDuty("PagerDuty")
CriticalRouting --> EmailManagers("Email")
CriticalRouting --> SlackUrgent("Slack")
CriticalRouting --> SMS("SMS")
%% Style definitions
classDef alert fill:#ffeb3b,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
classDef neutral fill:#f9f9f9,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
classDef complete fill:#4caf50,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
classDef database fill:#2196F3,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
%% Apply styles
class Alert alert
class Check database
class WarningRouting,CriticalRouting complete
class SlackChannel,EmailOps,PagerDuty,EmailManagers,SlackUrgent,SMS neutral
flowchart TD
Start("Normal Operation") --> Maintenance("Maintenance Window?")
Maintenance -->|"No"| NormalOps("Continue Normal Alerting")
Maintenance -->|"Yes"| ApiAction("Choose Action")
ApiAction --> SilenceAll("SILENCE ALL")
ApiAction --> DisableAll("DISABLE ALL")
ApiAction --> SilenceSelect("SILENCE Specific")
SilenceAll --> Reset("RESET when done")
DisableAll --> Reset
SilenceSelect --> Reset
Reset --> Restored("Normal Operations Restored")
%% Style definitions
classDef alert fill:#ffeb3b,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
classDef neutral fill:#f9f9f9,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
classDef complete fill:#4caf50,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
classDef database fill:#2196F3,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
%% Apply styles
class Start,Restored complete
class Maintenance database
class ApiAction alert
class NormalOps,SilenceAll,DisableAll,SilenceSelect,Reset neutral
:::tip
Use the edit-config script to safely edit configuration files. It automatically creates the necessary files in the right place and opens them in your editor.
Learn how to use edit-config
:::
Open the Agent's health notification config:
sudo ./edit-config health_alarm_notify.conf
Set up the required API keys or credentials for the service you want to use.
Define recipients per role (see below).
Restart the Agent for changes to take effect:
sudo systemctl restart netdata
Here's an example alert assigned to the sysadmin role from the ram.conf file:
alarm: ram_in_use
on: system.ram
class: Utilization
type: System
component: Memory
os: linux
hosts: *
calc: $used * 100 / ($used + $cached + $free + $buffers)
units: %
every: 10s
warn: $this > (($status >= $WARNING) ? (80) : (90))
crit: $this > (($status == $CRITICAL) ? (90) : (98))
delay: down 15m multiplier 1.5 max 1h
info: system memory utilization
to: sysadmin
Then, in health_alarm_notify.conf, you assign recipients per notification method:
role_recipients_email[sysadmin]="[email protected] [email protected]"
role_recipients_slack[sysadmin]="#alerts #infra"
# Backend team receives database and application server alerts
role_recipients_slack[backend]="#backend-team"
role_recipients_pagerduty[backend]="PDK3Y5EXAMPLE"
# Frontend team receives web server and CDN alerts
role_recipients_slack[frontend]="#frontend-team"
role_recipients_opsgenie[frontend]="key1example"
# Security team receives all security-related alerts
role_recipients_email[security]="[email protected]"
role_recipients_slack[security]="#security-alerts"
# SRE team receives critical infrastructure alerts 24/7
role_recipients_slack[sre]="#sre-alerts"
role_recipients_pagerduty[sre]="PDK3Y5SREXAMPLE"
role_recipients_telegram[sre]="123456789"
You can use external scripts to dynamically change recipients based on work hours, on-call schedules, etc.:
# Use a script to determine the current on-call engineer
ONCALL_EMAIL=$(get_oncall_email.sh)
role_recipients_email[oncall]="${ONCALL_EMAIL}"
role_recipients_sms[oncall]="${ONCALL_PHONE}"
# Standard business hours team gets non-critical alerts during work hours
role_recipients_slack[business_hours]="#daytime-monitoring"
Netdata provides a powerful Health Management API that lets you control alert behavior during maintenance windows, testing, or other planned activities.
The API is protected by an authorization token stored in /var/lib/netdata/netdata.api.key:
# Get your token
TOKEN=$(cat /var/lib/netdata/netdata.api.key)
# Use the token in API calls
curl "http://localhost:19999/api/v1/manage/health?cmd=RESET" -H "X-Auth-Token: ${TOKEN}"
Completely stops evaluation of health checks during maintenance:
curl "http://localhost:19999/api/v1/manage/health?cmd=DISABLE ALL" -H "X-Auth-Token: ${TOKEN}"
Continues to evaluate health checks but prevents notifications:
curl "http://localhost:19999/api/v1/manage/health?cmd=SILENCE ALL" -H "X-Auth-Token: ${TOKEN}"
Target only certain alerts by name, chart, context, host, or family:
# Silence all disk space alerts
curl "http://localhost:19999/api/v1/manage/health?cmd=SILENCE&context=disk_space" -H "X-Auth-Token: ${TOKEN}"
# Disable CPU alerts for specific hosts
curl "http://localhost:19999/api/v1/manage/health?cmd=DISABLE&context=cpu&hosts=prod-db-*" -H "X-Auth-Token: ${TOKEN}"
Check what's currently silenced or disabled:
curl "http://localhost:19999/api/v1/manage/health?cmd=LIST" -H "X-Auth-Token: ${TOKEN}"
Re-enable all health checks and notifications:
curl "http://localhost:19999/api/v1/manage/health?cmd=RESET" -H "X-Auth-Token: ${TOKEN}"
Define who receives alerts and how:
role_recipients_email[sysadmin]="[email protected]"
role_recipients_telegram[webmaster]="123456789"
role_recipients_slack[dba]="#database-alerts"
Use spaces to separate multiple recipients.
To disable a notification method for a role, use:
role_recipients_email[sysadmin]="disabled"
If left empty, the default recipient for that method is used.
</details> <details> <summary><strong>Alert Severity Filtering</strong></summary>You can limit certain recipients to only receive critical alerts:
role_recipients_email[sysadmin]="[email protected] [email protected]|critical"
This setup:
[email protected][email protected]Works for all supported methods: email, Slack, Telegram, Twilio, Discord, etc.
</details> <details> <summary><strong>Proxy Settings</strong></summary>To send notifications via a proxy, set these environment variables:
export http_proxy="http://10.0.0.1:3128/"
export https_proxy="http://10.0.0.1:3128/"
By default, Netdata includes public image URLs in notifications (hosted by the global Registry).
To use custom image paths:
images_base_url="http://my.public.netdata.server:19999"
Change the timestamp format in notifications:
date_format="+%F %T%:z" # Example: RFC 3339
Common formats:
| Format | String |
|---|---|
| ISO 8601 | +%FT%T%z |
| RFC 5322 | +%a, %d %b %Y %H:%M:%S %z |
| RFC 3339 | +%F %T%:z |
| Local time | +%x %X |
| ANSI C / asctime() | (leave empty) |
See man date for more formatting options.
By default, Netdata uses the short hostname in notifications.
To use the fully qualified domain name (FQDN), set:
use_fqdn=YES
If you've set a custom hostname in netdata.conf, that value takes priority.
You can test alert notifications manually.
# Switch to the Netdata user
sudo su -s /bin/bash netdata
# Enable debugging
export NETDATA_ALARM_NOTIFY_DEBUG=1
# Test default role (sysadmin)
./plugins.d/alarm-notify.sh test
# Test specific role
./plugins.d/alarm-notify.sh test "webmaster"
:::important
If you're running your own Netdata Registry, set:
export NETDATA_REGISTRY_URL="https://your.registry.url"
before testing.
:::
To see the full execution output:
bash -x ./plugins.d/alarm-notify.sh test
Then look for the internal calls and re-run the one you want to trace in more detail.
Here are solutions for common alert notification issues:
Verify your email configuration:
grep -E "SEND_EMAIL|DEFAULT_RECIPIENT_EMAIL" /etc/netdata/health_alarm_notify.conf
Check if the system can send mail:
echo "Test" | mail -s "Test Email" [email protected]
Look for errors in the Netdata log:
tail -f /var/log/netdata/error.log | grep "alarm notify"
Test with debugging enabled:
sudo su -s /bin/bash netdata
export NETDATA_ALARM_NOTIFY_DEBUG=1
./plugins.d/alarm-notify.sh test
Verify your webhook URL is correct:
grep -E "SLACK_WEBHOOK_URL" /etc/netdata/health_alarm_notify.conf
Check for network connectivity to Slack:
curl -X POST -H "Content-type: application/json" --data '{"text":"Test"}' YOUR_WEBHOOK_URL
Confirm channel names start with # in your configuration.
Verify your service key:
grep -E "PAGERDUTY_SERVICE_KEY" /etc/netdata/health_alarm_notify.conf
Test the PagerDuty API directly:
curl -H "Content-Type: application/json" -X POST -d '{"service_key":"YOUR_SERVICE_KEY","event_type":"trigger","description":"Test"}' https://events.pagerduty.com/generic/2010-04-15/create_event.json
If notifications seem delayed:
delay parameter in your alarm configurationhealth.d/*.conf files for delay settingsALARM_NOTIFY_DELAY setting in health_alarm_notify.conf