architecture/design/platform-alerting-and-notification.md
Tracking GitHub Issue: 8212
Yugabyte Platform will have default, preconfigured alerts, both at platform and universe level. Univesre alerts can be configured globally for all universes, or per specific universe. In addition to the above default alerts, users can configure their alerts based on a specific condition on any available metric.
Every alert has the following information:
Duration configured as M minutes means that it is a time to wait for alert condition to be true for M more minutes after evaluation first succeeds before raising alerts.
The check interval should be 1 minute for prometheus based alerts (the current default for the health check interval minute). The check interval is the amount of time from the start of one probe to the start of the next probe.
Alert notifications should be sent in real time (rather than grouping all alerts into batches over X minutes before notifying on subscribe channel like Email)
Alerts should be snoozed when universe/node creation or removal is in progress to avoid unnecessary alerts to be generated.
When an universe is deleted, corresponding alerts should also be deleted.
Should have the ability to send test alerts to ensure right alerts are raised for the defined condition and threshold.
To resolve each alert playbook should be provided. For now. playbook should be just documentation with alert resolution information like -
To see a list of alerts, click the Alerts tab on the left. By default, alerts are sorted in reverse chronological order by the alert raised time, but should have the ability to reorder the list by clicking the column headings.
Whenever an alert triggers, it sends an alert data to its designated alert destinations. Destinations use this alert data to send emails, Slack messages, etc.
The default destination for any alert is the email address of the user who created it. If you create an alert and want to be notified by email, you don’t need to set up a new alert destination.