docs/sources/tutorials/alerting-get-started-pt5/index.md
The Get started with Grafana Alerting - Dynamic routing tutorial is a continuation of the Get started with Grafana Alerting - Templating tutorial.
{{< youtube id="hSejnv1cdYY" >}}
Imagine you are managing a web application or a fleet of servers, tracking critical metrics such as CPU, memory, and disk usage. While monitoring is essential, managing alerts allows your team to act on issues without necessarily feeling overwhelmed by the noise.
In this tutorial you will learn how to:
Interactive learning environment
Grafana OSS
To observe data using the Grafana stack, download and run the following files.
Clone the tutorial environment repository.
git clone https://github.com/tonypowa/grafana-prometheus-alerting-demo.git
Change to the directory where you cloned the repository:
cd grafana-prometheus-alerting-demo
Build the Grafana stack:
<!-- INTERACTIVE ignore START -->docker compose build
{{< docs/ignore >}}
<!-- INTERACTIVE exec START -->docker-compose build
{{< /docs/ignore >}}
Bring up the containers:
<!-- INTERACTIVE ignore START -->docker compose up -d
{{< docs/ignore >}}
<!-- INTERACTIVE exec START -->docker-compose up -d
{{< /docs/ignore >}}
The first time you run docker compose up -d, Docker downloads all the necessary resources for the tutorial. This might take a few minutes, depending on your internet connection.
{{< admonition type="note" >}} If you already have Grafana, Loki, or Prometheus running on your system, you might see errors, because the Docker image is trying to use ports that your local installations are already using. If this is the case, stop the services, then run the command again. {{< /admonition >}}
<!-- INTERACTIVE ignore END -->{{< docs/ignore >}}
NOTE:
If you already have Grafana, Loki, or Prometheus running on your system, you might see errors, because the Docker image is trying to use ports that your local installations are already using. If this is the case, stop the services, then run the command again.
{{< /docs/ignore >}}
<!-- INTERACTIVE page step1.md END --> <!-- INTERACTIVE page step2.md START -->In this use case, we focus on monitoring the system's CPU, memory, and disk usage as part of a monitoring setup. This example is based on the Grafana Prometheus Alerting Demo, which collects and visualizes system metrics via Prometheus and Grafana.
Your team is responsible for ensuring the health of your servers, and you want to leverage advanced alerting features in Grafana to:
In the provided demo setup, you're monitoring:
You have a mixture of critical alerts (e.g., CPU usage over 75%) and warning alerts (e.g., memory usage over 60%).
This Flask-based Python script simulates a service that:
instance="flask-prod:5000"instance="flask-staging:5000"deployment="prod-us-cs30"deployment="staging-us-cs20"Use templates to dynamically populate a custom label that matches a notification policy, and therefore routes alerts to the correct contact point.
We'll automatically determine the environment associated with each firing alert by inspecting system metrics (e.g., CPU, memory) and extracting keywords using regular expressions with the Go templating language.
<!-- INTERACTIVE page step2.md END --> <!-- INTERACTIVE page step3.md START -->Notification policies route alert instances to contact points via label matchers. Since we know what labels our application returns (e.g., job, instance, deployment), we can use them to match alert rules and define appropriate notification routing.
Although our application doesn't explicitly include an environment label, we can rely on other labels like instance or deployment, which may contain keywords (like prod or staging) that indicate the environment.
Sign in to Grafana:
Navigate to Alerts & IRM > Alerting > Notification Policies.
Add a child policy:
environment.=.production.prod.Choose a contact point:
For a quick test, you can use a public webhook from webhook.site to capture and inspect alert notifications. If you choose this method, select Webhook from the drop-down menu in contact points.
Enable continue matching:
Save and repeat
environment = staging as the label/value pair.Now that the labels are defined, we can create alert rules for CPU and memory metrics. These alert rules will use the labels from the collected and stored metrics in Prometheus.
<!-- INTERACTIVE page step3.md END --> <!-- INTERACTIVE page step4.md START -->Follow these steps to manually create alert rules and link them to a visualization.
Make it short and descriptive, as this will appear in your alert notification. For instance, cpu-usage .
Select Prometheus data source from the drop-down menu.
In the query section, enter the following query:
** switch to Code mode if not already selected **
flask_app_cpu_usage{}
Alert condition section:
75 as the value for WHEN QUERY IS ABOVE to set the threshold for the alert.{{< figure src="/media/docs/alerting/flask-app-metrics.png" max-width="1200px" caption="Preview of a query returning alert instances in Grafana." >}}
Among the labels returned for flask_app_cpu_usage, the labels instance and deployment contain values that include the term prod and staging. We will create a template later to detect these keywords, so that any firing alert instances are routed to the relevant contact points (e.g., alerts-prod, alerts-staging).
In this section we add a templated label based on query value to map to the notification policies.
<!-- INTERACTIVE page step4.md END --> <!-- INTERACTIVE page step5.md START -->In Folder, click + New folder and enter a name. For example: app-metrics . This folder contains our alerts.
Click + Add labels.
Key field: environment .
In the value field copy in the following template:
{{- $env := reReplaceAll ".*([pP]rod|[sS]taging|[dD]ev).*" "${1}" $labels.instance -}}
{{- if eq $env "prod" -}}
production
{{- else if eq $env "staging" -}}
staging
{{- else -}}
development
{{- end -}}
This template uses a regular expression to extract prod, staging, or dev from the instance label ($labels.instance) and maps it to a more readable label (like "production" for "prod").
As result, when alerts exceed a threshold, the template checks the labels, such as instance="flask-prod:5000", instance="flask-staging:5000", or custom labels like deployment="prod-us-cs30", and assigns a value of production, staging or development to the custom environment environment label.
This label is then used by the alert notification policy to route alerts to the appropriate team, so that notifications are delivered efficiently, and reducing unnecessary noise.
system-usage.1m.0s (zero seconds), so the alert rule fires the moment the condition is met (this minimizes the waiting time for the demonstration.).0s, so the alert stops firing immediately after the condition is no longer true.Select who should receive a notification when an alert rule fires.
Toggle the Advance options button.
Click Preview routing.
The preview should display which firing alerts are routed to contact points based on notification policies that match the environment label.
{{< figure src="/media/docs/alerting/dynamic-routing-preview-prod-staging.png" max-width="1200px" caption="Notification policies matched by the environment label matcher." >}}
The environment label matcher should map to the notification policies created earlier. This makes sure that firing alert instances are routed to the appropriate contact points associated with each policy.
60%).memory-usageflask_app_memory_usage{}Now that the CPU and memory alert rules are set up, they are linked to the notification policies through the custom label matcher we added. The value of the label dynamically changes based on the environment template, using $labels.instance. This ensures that the label value will be set to production, staging, or development, depending on the environment.
Based on your query's instance label values (which contain keywords like prod or staging ), Grafana dynamically assigns the value production, staging or development to the custom environment label using the template. This dynamic label then matches the label matchers in your notification policies, which route alerts to the correct contact points.
To see this in action go to Alerts & IRM > Alerting > Active notifications
This page shows grouped alerts that are currently triggering notifications. If you click on any alert group to view its label set, contact point, and number of alert instances. Notice that the environment label has been dynamically populated with values like production.
{{< figure src="/media/docs/alerting/routing-active-notification-detail.png" max-width="1200px" caption="Expanded alert in Active notifications section" >}}
Finally, you should receive notifications at the contact point associated with either prod or staging.
Feel free to experiment by changing the template to match other labels that contain any of the watched keywords. For example, you could reference:
$labels.deployment
The template should be flexible enough to capture the target keywords (e.g., prod, staging) by adjusting which label the$labels is referencing.
By using notification policies, you can route alerts based on query values, directing them to the appropriate teams.
{{< admonition type="tip" >}}
In Grafana Alerting - Link alerts to visualizations you will create alerts using Prometheus data and link them to your graphs.
{{< /admonition >}}
<!-- INTERACTIVE ignore END -->{{< docs/ignore >}}
In Grafana Alerting - Link alerts to visualizations you will create alerts using Prometheus data and link them to your graphs.
{{< /docs/ignore >}}
Explore related topics covered in this tutorial: