Agent Alert Notifications

Netdata's Agent can send alert notifications directly from each node. It supports a wide range of services, multiple recipients, and role-based routing.

How It Works

The Agent uses a notification script defined in netdata.conf under the [health] section:

ini

script to execute on alarm = /usr/libexec/netdata/plugins.d/alarm-notify.sh

The default script is alarm-notify.sh.

This script handles:

Multiple recipients
Multiple notification methods
Role-based routing (e.g., sysadmin, webmaster, dba)

Role-Based Routing Visualization

mermaid

flowchart TD
    Alert("High CPU Usage Alert") --> Check("Severity Level")
    
    Check -->|"WARNING"| WarningRouting("Role: SysAdmin")
    Check -->|"CRITICAL"| CriticalRouting("Multiple Roles")
    
    WarningRouting --> SlackChannel("Slack")
    WarningRouting --> EmailOps("Email")
    
    CriticalRouting --> PagerDuty("PagerDuty")
    CriticalRouting --> EmailManagers("Email")
    CriticalRouting --> SlackUrgent("Slack")
    CriticalRouting --> SMS("SMS")
    
    %% Style definitions
    classDef alert fill:#ffeb3b,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
    classDef neutral fill:#f9f9f9,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
    classDef complete fill:#4caf50,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
    classDef database fill:#2196F3,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px

    %% Apply styles
    class Alert alert
    class Check database
    class WarningRouting,CriticalRouting complete
    class SlackChannel,EmailOps,PagerDuty,EmailManagers,SlackUrgent,SMS neutral

Health Management API Workflow

mermaid

flowchart TD
    Start("Normal Operation") --> Maintenance("Maintenance Window?")
    
    Maintenance -->|"No"| NormalOps("Continue Normal Alerting")
    Maintenance -->|"Yes"| ApiAction("Choose Action")
    
    ApiAction --> SilenceAll("SILENCE ALL")
    ApiAction --> DisableAll("DISABLE ALL")
    ApiAction --> SilenceSelect("SILENCE Specific")
    
    SilenceAll --> Reset("RESET when done")
    DisableAll --> Reset
    SilenceSelect --> Reset
    
    Reset --> Restored("Normal Operations Restored")
    
    %% Style definitions
    classDef alert fill:#ffeb3b,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
    classDef neutral fill:#f9f9f9,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
    classDef complete fill:#4caf50,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px
    classDef database fill:#2196F3,stroke:#000000,stroke-width:3px,color:#000000,font-size:18px

    %% Apply styles
    class Start,Restored complete
    class Maintenance database
    class ApiAction alert
    class NormalOps,SilenceAll,DisableAll,SilenceSelect,Reset neutral

Quick Setup

:::tip

Use the edit-config script to safely edit configuration files. It automatically creates the necessary files in the right place and opens them in your editor. Learn how to use edit-config

:::

Open the Agent's health notification config:
bash
```
sudo ./edit-config health_alarm_notify.conf
```
Set up the required API keys or credentials for the service you want to use.
Define recipients per role (see below).
Restart the Agent for changes to take effect:
bash
```
sudo systemctl restart netdata
```

Example: Alert with Role-Based Routing

Here's an example alert assigned to the sysadmin role from the ram.conf file:

ini

alarm: ram_in_use
   on: system.ram
class: Utilization
 type: System
component: Memory
     os: linux
  hosts: *
   calc: $used * 100 / ($used + $cached + $free + $buffers)
  units: %
  every: 10s
   warn: $this > (($status >= $WARNING)  ? (80) : (90))
   crit: $this > (($status == $CRITICAL) ? (90) : (98))
  delay: down 15m multiplier 1.5 max 1h
   info: system memory utilization
     to: sysadmin

Then, in health_alarm_notify.conf, you assign recipients per notification method:

ini

role_recipients_email[sysadmin]="[email protected] [email protected]"
role_recipients_slack[sysadmin]="#alerts #infra"

Advanced Role-Based Routing Examples

<details> <summary>DevOps Team Example</summary>

ini

# Backend team receives database and application server alerts
role_recipients_slack[backend]="#backend-team"
role_recipients_pagerduty[backend]="PDK3Y5EXAMPLE"

# Frontend team receives web server and CDN alerts
role_recipients_slack[frontend]="#frontend-team"
role_recipients_opsgenie[frontend]="key1example"

# Security team receives all security-related alerts
role_recipients_email[security]="[email protected]"
role_recipients_slack[security]="#security-alerts"

# SRE team receives critical infrastructure alerts 24/7
role_recipients_slack[sre]="#sre-alerts"
role_recipients_pagerduty[sre]="PDK3Y5SREXAMPLE"
role_recipients_telegram[sre]="123456789"

</details> <details> <summary>Time-Based Routing Example</summary>

You can use external scripts to dynamically change recipients based on work hours, on-call schedules, etc.:

ini

# Use a script to determine the current on-call engineer
ONCALL_EMAIL=$(get_oncall_email.sh)
role_recipients_email[oncall]="${ONCALL_EMAIL}"
role_recipients_sms[oncall]="${ONCALL_PHONE}"

# Standard business hours team gets non-critical alerts during work hours
role_recipients_slack[business_hours]="#daytime-monitoring"

</details>

Health Management API

Netdata provides a powerful Health Management API that lets you control alert behavior during maintenance windows, testing, or other planned activities.

API Authorization

The API is protected by an authorization token stored in /var/lib/netdata/netdata.api.key:

bash

# Get your token
TOKEN=$(cat /var/lib/netdata/netdata.api.key)

# Use the token in API calls
curl "http://localhost:19999/api/v1/manage/health?cmd=RESET" -H "X-Auth-Token: ${TOKEN}"

Common API Commands

<details> <summary>Disable All Health Checks</summary>

Completely stops evaluation of health checks during maintenance:

bash

curl "http://localhost:19999/api/v1/manage/health?cmd=DISABLE ALL" -H "X-Auth-Token: ${TOKEN}"

</details> <details> <summary>Silence All Notifications</summary>

Continues to evaluate health checks but prevents notifications:

bash

curl "http://localhost:19999/api/v1/manage/health?cmd=SILENCE ALL" -H "X-Auth-Token: ${TOKEN}"

</details> <details> <summary>Disable Specific Alerts</summary>

Target only certain alerts by name, chart, context, host, or family:

bash

# Silence all disk space alerts
curl "http://localhost:19999/api/v1/manage/health?cmd=SILENCE&context=disk_space" -H "X-Auth-Token: ${TOKEN}"

# Disable CPU alerts for specific hosts
curl "http://localhost:19999/api/v1/manage/health?cmd=DISABLE&context=cpu&hosts=prod-db-*" -H "X-Auth-Token: ${TOKEN}"

</details> <details> <summary>View Current Silenced/Disabled Alerts</summary>

Check what's currently silenced or disabled:

bash

curl "http://localhost:19999/api/v1/manage/health?cmd=LIST" -H "X-Auth-Token: ${TOKEN}"

</details> <details> <summary>Reset to Normal Operation</summary>

Re-enable all health checks and notifications:

bash

curl "http://localhost:19999/api/v1/manage/health?cmd=RESET" -H "X-Auth-Token: ${TOKEN}"

</details>

Configuration Options

<details> <summary>Recipients Per Role</summary>

Define who receives alerts and how:

ini

role_recipients_email[sysadmin]="[email protected]"
role_recipients_telegram[webmaster]="123456789"
role_recipients_slack[dba]="#database-alerts"

Use spaces to separate multiple recipients.

To disable a notification method for a role, use:

ini

role_recipients_email[sysadmin]="disabled"

If left empty, the default recipient for that method is used.

</details> <details> <summary>Alert Severity Filtering</summary>

You can limit certain recipients to only receive critical alerts:

ini

role_recipients_email[sysadmin]="[email protected] [email protected]|critical"

This setup:

Sends all alerts to [email protected]
Sends only critical-related alerts to [email protected]

Works for all supported methods: email, Slack, Telegram, Twilio, Discord, etc.

</details> <details> <summary>Proxy Settings</summary>

To send notifications via a proxy, set these environment variables:

bash

export http_proxy="http://10.0.0.1:3128/"
export https_proxy="http://10.0.0.1:3128/"

</details> <details> <summary>Notification Images</summary>

By default, Netdata includes public image URLs in notifications (hosted by the global Registry).

To use custom image paths:

ini

images_base_url="http://my.public.netdata.server:19999"

</details> <details> <summary>Custom Date Format</summary>

Change the timestamp format in notifications:

ini

date_format="+%F %T%:z"   # Example: RFC 3339

Common formats:

Format	String
ISO 8601	`+%FT%T%z`
RFC 5322	`+%a, %d %b %Y %H:%M:%S %z`
RFC 3339	`+%F %T%:z`
Local time	`+%x %X`
ANSI C / asctime()	(leave empty)

See man date for more formatting options.

</details> <details> <summary>Hostname Format</summary>

By default, Netdata uses the short hostname in notifications.

To use the fully qualified domain name (FQDN), set:

ini

use_fqdn=YES

If you've set a custom hostname in netdata.conf, that value takes priority.

</details>

Testing Your Notification Setup

You can test alert notifications manually.

bash

# Switch to the Netdata user
sudo su -s /bin/bash netdata

# Enable debugging
export NETDATA_ALARM_NOTIFY_DEBUG=1

# Test default role (sysadmin)
./plugins.d/alarm-notify.sh test

# Test specific role
./plugins.d/alarm-notify.sh test "webmaster"

:::important

If you're running your own Netdata Registry, set:

bash

export NETDATA_REGISTRY_URL="https://your.registry.url"

before testing.

:::

Debugging with Trace

To see the full execution output:

bash

bash -x ./plugins.d/alarm-notify.sh test

Then look for the internal calls and re-run the one you want to trace in more detail.

Troubleshooting Alert Notifications

Here are solutions for common alert notification issues:

Email Notifications Not Working

Verify your email configuration:

bash

grep -E "SEND_EMAIL|DEFAULT_RECIPIENT_EMAIL" /etc/netdata/health_alarm_notify.conf

Check if the system can send mail:

bash

echo "Test" | mail -s "Test Email" [email protected]

Look for errors in the Netdata log:

bash

tail -f /var/log/netdata/error.log | grep "alarm notify"

Test with debugging enabled:

bash

sudo su -s /bin/bash netdata
export NETDATA_ALARM_NOTIFY_DEBUG=1
./plugins.d/alarm-notify.sh test

Slack Notifications Failing

Verify your webhook URL is correct:

bash

grep -E "SLACK_WEBHOOK_URL" /etc/netdata/health_alarm_notify.conf

Check for network connectivity to Slack:

bash

curl -X POST -H "Content-type: application/json" --data '{"text":"Test"}' YOUR_WEBHOOK_URL

Confirm channel names start with # in your configuration.

PagerDuty Integration Issues

Verify your service key:

bash

grep -E "PAGERDUTY_SERVICE_KEY" /etc/netdata/health_alarm_notify.conf

Test the PagerDuty API directly:

bash

curl -H "Content-Type: application/json" -X POST -d '{"service_key":"YOUR_SERVICE_KEY","event_type":"trigger","description":"Test"}' https://events.pagerduty.com/generic/2010-04-15/create_event.json

Notification Delays

If notifications seem delayed:

Check the delay parameter in your alarm configuration
Verify your health.d/*.conf files for delay settings
Check the ALARM_NOTIFY_DELAY setting in health_alarm_notify.conf

Agent Alert Notifications

Agent Alert Notifications

How It Works

Role-Based Routing Visualization

Health Management API Workflow

Quick Setup

Example: Alert with Role-Based Routing

Advanced Role-Based Routing Examples

Health Management API

API Authorization

Common API Commands

Configuration Options

Testing Your Notification Setup

Debugging with Trace

Troubleshooting Alert Notifications

Email Notifications Not Working

Slack Notifications Failing

PagerDuty Integration Issues

Notification Delays

Related Docs