docs/sources/alerting/guides/missing-data.md
Missing data from when a target stops reporting metric data can be one of the most common issues when troubleshooting alerts. In cloud-native environments, this happens all the time. Pods or nodes scale down to match demand, or an entire job quietly disappears.
When this happens, alerts won’t fire, and you might not notice the system has stopped reporting.
Sometimes it's just a lack of data from a few instances. Other times, it's a connectivity issue where the entire target is unreachable.
This guide covers different scenarios where the underlying data is missing and shows how to design your alerts to act on those cases. If you're troubleshooting an unreachable host or a network failure, see the Handle connectivity errors documentation as well.
There are a few common causes when an instance stops reporting data, similar to connectivity errors:
The first thing to understand is the difference between a query failure (or connectivity error), No Data, and a Missing Series.
Alert queries often return multiple time series — one per instance, pod, region, or label combination. This is known as a multi-dimensional alert, meaning a single alert rule can trigger multiple alert instances (alerts).
For example, imagine a recorded metric, http_request_latency_seconds, that reports latency per second in the regions where the application is deployed. The query returns one series per region — for instance, region1 and region2 — and generates only two alert instances. In this scenario, you may experience:
In both No Data and Missing Series cases, the query still technically "works", but the alert won’t fire unless you explicitly configure it to handle these situations.
The following tables illustrate both scenarios using the previous example, with an alert that triggers if the latency exceeds 2 seconds in any region: avg_over_time(http_request_latency_seconds[5m]) > 2.
No Data Scenario: The query returns no data for any series:
| Time | region1 | region2 | Alert triggered |
|---|---|---|---|
| 00:00 | 1.5s 🟢 | 1s 🟢 | ✅ No Alert |
| 01:00 | No Data ⚠️ | No Data ⚠️ | ⚠️ No Alert (Silent Failure) |
| 02:00 | 1.4s 🟢 | 1s 🟢 | ✅ No Alert |
MissingSeries Scenario: Only a specific series (region2) disappears:
| Time | region1 | region2 | Alert triggered |
|---|---|---|---|
| 00:00 | 1.5s 🟢 | 1s 🟢 | ✅ No Alert |
| 01:00 | 1.6s 🟢 | Missing Series ⚠️ | ⚠️ No Alert (Silent Failure) |
| 02:00 | 1.4s 🟢 | 1s 🟢 | ✅ No Alert |
In both cases, something broke silently.
Prometheus doesn't fire alerts when the query returns no data. It simply assumes there was nothing to report, like with query errors. Missing data won’t trigger existing alerts unless you explicitly check for it.
In Prometheus, a common way to catch missing data is by to use the absent_over_time function.
absent_over_time(http_request_latency_seconds[5m]) == 1
This triggers when all series for http_request_latency_seconds are absent for 5 minutes — catching the No Data case when the entire metric disappears.
However, absent_over_time() can’t detect which specific series are missing since it doesn’t preserve labels. The alert won’t tell you which series stopped reporting, only that the query returns no data.
If you want to check for missing data per-region or label, you can specify the label in the alert query as follows:
# Detect missing data in region1
absent_over_time(http_request_latency_seconds{region="region1"}[5m]) == 1
# Detect missing data in region2
absent_over_time(http_request_latency_seconds{region="region2"}[5m]) == 1
But this doesn't scale well. It is unreliable to have hard-coded queries for each label set, especially in dynamic cloud environments where instances can appear or disappear at any time.
To detect when a specific target has disappeared, see below Evict alert instances for missing series for details on how Grafana handles this case and how to set up detection.
While Prometheus provides functions like absent_over_time() to detect missing data, not all data sources — like Graphite, InfluxDB, PostgreSQL, and others — available to Grafana alerts support a similar function.
To handle this, Grafana Alerting implements a built-in No Data state logic, so you don’t need to detect missing data with absent_* queries. Instead, you can configure in the alert rule settings how alerts behave when no data is returned.
Similar to error handling, Grafana triggers a special No data alert by default and lets you control this behavior. In Configure no data and error handling, click Alert state if no data or all values are null, and choose one of the following options:
No Data (default): Triggers a new DatasourceNoData alert, treating No data as a specific problem.
Alerting: Transition each existing alert instance into the Alerting state when data disappears.
Normal: Ignores missing data and transitions all instances to the Normal state. Useful when receiving intermittent data, such as from experimental services, sporadic actions, or periodic reports.
Keep Last State: Leaves the alert in its previous state until the data returns. This is common in environments where brief metric gaps happen regularly, like with flaky exporters or noisy environments.
{{< figure src="/media/docs/alerting/alert-rule-configure-no-data.png" alt="A screenshot of the Configure no data handling option in Grafana Alerting." max-width="500px" >}}
When Grafana triggers a NoData alert, it creates a distinct alert instance, separate from the original alert instance. These alerts behave differently:
alertname: DatasourceNoData.Because of this, DatasourceNoData alerts might require a dedicated setup to handle their notifications. For general recommendations, see Reduce redundant DatasourceError alerts — similar practices can apply to NoData alerts.
MissingSeries occurs when only some series disappear but not all. This case is subtle, but important.
Grafana marks missing series as stale after two evaluation intervals and triggers the alert instance eviction process. Here’s what happens under the hood:
grafana_state_reason: MissingSeries.Normal state.If an alert instance becomes stale, you’ll find it in the alert history as Normal (Missing Series) before it disappears. This table shows the eviction process from the previous example:
| Time | region1 | region2 | Alert triggered |
|---|---|---|---|
| 00:00 | 1.5s 🟢 | 1s 🟢 | 🟢🟢 No Alerts |
| 01:00 | 3s 🔴 | ||
Alerting | 3s 🔴 | ||
Alerting | 🔴🔴 Alert instances triggered for both regions | ||
| 02:00 | 1.6s 🟢 | (MissingSeries)⚠️ | |
Alerting ️ | 🟢🔴 Region2 missing, state maintained. | ||
| 03:00 | 1.4s 🟢 | (MissingSeries) | |
Normal | 🟢🟢 region2 was resolved, 📩 notification sent, and instance evicted. | ||
| 04:00 | 1.4s 🟢 | — | 🟢 No Alerts. region2 was evicted. |
In dynamic environments, such as autoscaling groups, ephemeral pods, spot instances, series naturally come and go. MissingSeries normally signals infrastructure or deployment changes.
By default, No Data triggers an alert to indicate a potential problem.
The eviction process for MissingSeries is designed to prevent alert flapping when a pod or instance disappears, reducing alert noise.
In environments with frequent scale events, prioritize symptom-based alerts over individual infrastructure signals and use aggregate alerts unless you explicitly need to track individual instances.
A stale alert instance triggers a resolved notification if it transitions from a firing state (such as Alerting, No Data, or Error) to Normal, and the grafana_state_reason annotation is set to MissingSeries to indicate that the alert wasn’t resolved by recovery but evicted because the series data went missing.
Recognizing these notifications helps you handle them appropriately. For example:
grafana_state_reason annotation to clearly identify MissingSeries alerts.grafana_state_reason annotation to process these alerts differently.Also, review these notifications to confirm whether something broke or if the alert was unnecessary. To reduce noise:
Previously, an example showed how to detect missing data for a specific label, such as region:
# Detect missing data in region1
absent_over_time(http_request_latency_seconds{region="region1"}[5m]) == 1
# Detect missing data in region2
absent_over_time(http_request_latency_seconds{region="region2"}[5m]) == 1
However, this approach doesn’t scale well because it requires hardcoding all possible region values.
As an alternative, you can create an alert rule that detects missing series dynamically using the present_over_time function:
present_over_time(http_request_latency_seconds{}[24h])
unless
present_over_time(http_request_latency_seconds{}[10m])
Or, if you want to group by a label such as region:
group(present_over_time(http_request_latency_seconds{}[24h])) by (region)
unless
group(present_over_time(http_request_latency_seconds{}[10m])) by (region)
This query finds regions (or other targets) that were present at any time in the past 24 hours but have not been present in the past 10 minutes. The alert rule then triggers an alert instance for each missing region. You can apply the same technique to any label or target dimension.
Missing data isn’t always a failure. It’s a common scenario in dynamic environments when certain targets stop reporting.
Grafana Alerting handles distinct scenarios automatically. Here’s how to think about it:
DatasourceNoData and MissingSeries notifications, since they don’t behave like regular alerts.your_metric_query OR on() vector(0) to return 0 when your_metric_query returns nothing.absent_over_time() or present_over_time in Prometheus to detect when a metric or target disappears.now-1m) to account for late data points.last_over_time(metric_name[10m]) to pick the most recent sample within a given window.Keep Last State, or routing those alerts differently.