Back to Thanos

Alerts

mixin/runbook.md

0.41.019.9 KB
Original Source

Alerts

Rule Groups

thanos-bucket-replicate

NameSummaryDescriptionSeverityRunbook
ThanosBucketReplicateErrorRateThanos Replicate is failing to run.Thanos Replicate is failing to run, {{$value humanize}}% of attempts failed.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateerrorrate
ThanosBucketReplicateRunLatencyThanos Replicate has a high latency for replicate operations.Thanos Replicate {{$labels.job}} has a 99th percentile latency of {{$value}} seconds for the replicate operations.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicaterunlatency

thanos-compact

NameSummaryDescriptionSeverityRunbook
ThanosCompactMultipleRunningThanos Compact has multiple instances running.No more than one Thanos Compact instance should be running at once. There are {{$value}} instances running.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanoscompactmultiplerunning
ThanosCompactHaltedThanos Compact has failed to run and is now halted.Thanos Compact {{$labels.job}} has failed to run and now is halted.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanoscompacthalted
ThanosCompactHighCompactionFailuresThanos Compact is failing to execute compactions.Thanos Compact {{$labels.job}} is failing to execute {{$value humanize}}% of compactions.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanoscompacthighcompactionfailures
ThanosCompactBucketHighOperationFailuresThanos Compact Bucket is having a high number of operation failures.Thanos Compact {{$labels.job}} Bucket is failing to execute {{$value humanize}}% of operations.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanoscompactbuckethighoperationfailures
ThanosCompactHasNotRunThanos Compact has not uploaded anything for last 24 hours.Thanos Compact {{$labels.job}} has not uploaded anything for 24 hours.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanoscompacthasnotrun

thanos-component-absent

NameSummaryDescriptionSeverityRunbook
ThanosCompactIsDownThanos component has disappeared.ThanosCompact has disappeared. Prometheus target for the component cannot be discovered.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanoscompactisdown
ThanosQueryIsDownThanos component has disappeared.ThanosQuery has disappeared. Prometheus target for the component cannot be discovered.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosqueryisdown
ThanosReceiveIsDownThanos component has disappeared.ThanosReceive has disappeared. Prometheus target for the component cannot be discovered.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceiveisdown
ThanosRuleIsDownThanos component has disappeared.ThanosRule has disappeared. Prometheus target for the component cannot be discovered.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosruleisdown
ThanosSidecarIsDownThanos component has disappeared.ThanosSidecar has disappeared. Prometheus target for the component cannot be discovered.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanossidecarisdown
ThanosStoreIsDownThanos component has disappeared.ThanosStore has disappeared. Prometheus target for the component cannot be discovered.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosstoreisdown

thanos-query

NameSummaryDescriptionSeverityRunbook
ThanosQueryHttpRequestQueryErrorRateHighThanos Query is failing to handle requests.Thanos Query {{$labels.job}} is failing to handle {{$value humanize}}% of "query" requests.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosqueryhttprequestqueryerrorratehigh
ThanosQueryHttpRequestQueryRangeErrorRateHighThanos Query is failing to handle requests.Thanos Query {{$labels.job}} is failing to handle {{$value humanize}}% of "query_range" requests.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosqueryhttprequestqueryrangeerrorratehigh
ThanosQueryGrpcServerErrorRateThanos Query is failing to handle requests.Thanos Query {{$labels.job}} is failing to handle {{$value humanize}}% of requests.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosquerygrpcservererrorrate
ThanosQueryGrpcClientErrorRateThanos Query is failing to send requests.Thanos Query {{$labels.job}} is failing to send {{$value humanize}}% of requests.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosquerygrpcclienterrorrate
ThanosQueryHighDNSFailuresThanos Query is having high number of DNS failures.Thanos Query {{$labels.job}} have {{$value humanize}}% of failing DNS queries for store endpoints.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosqueryhighdnsfailures
ThanosQueryInstantLatencyHighThanos Query has high latency for queries.Thanos Query {{$labels.job}} has a 99th percentile latency of {{$value}} seconds for instant queries.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosqueryinstantlatencyhigh
ThanosQueryRangeLatencyHighThanos Query has high latency for queries.Thanos Query {{$labels.job}} has a 99th percentile latency of {{$value}} seconds for range queries.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosqueryrangelatencyhigh
ThanosQueryOverloadThanos query reaches its maximum capacity serving concurrent requests.Thanos Query {{$labels.job}} has been overloaded for more than 15 minutes. This may be a symptom of excessive simultaneous complex requests, low performance of the Prometheus API, or failures within these components. Assess the health of the Thanos query instances, the connected Prometheus instances, look for potential senders of these requests and then contact support.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosqueryoverload

thanos-receive

NameSummaryDescriptionSeverityRunbook
ThanosReceiveHttpRequestErrorRateHighThanos Receive is failing to handle requests.Thanos Receive {{$labels.job}} is failing to handle {{$value humanize}}% of requests.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivehttprequesterrorratehigh
ThanosReceiveHttpRequestLatencyHighThanos Receive has high HTTP requests latency.Thanos Receive {{$labels.job}} has a 99th percentile latency of {{ $value }} seconds for requests.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivehttprequestlatencyhigh
ThanosReceiveHighReplicationFailuresThanos Receive is having high number of replication failures.Thanos Receive {{$labels.job}} is failing to replicate {{$value humanize}}% of requests.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivehighreplicationfailures
ThanosReceiveHighForwardRequestFailuresThanos Receive is failing to forward requests.Thanos Receive {{$labels.job}} is failing to forward {{$value humanize}}% of requests.infohttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivehighforwardrequestfailures
ThanosReceiveHighHashringFileRefreshFailuresThanos Receive is failing to refresh hasring file.Thanos Receive {{$labels.job}} is failing to refresh hashring file, {{$value humanize}} of attempts failed.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivehighhashringfilerefreshfailures
ThanosReceiveConfigReloadFailureThanos Receive has not been able to reload configuration.Thanos Receive {{$labels.job}} has not been able to reload hashring configurations.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceiveconfigreloadfailure
ThanosReceiveNoUploadThanos Receive has not uploaded latest data to object storage.Thanos Receive {{$labels.instance}} has not uploaded latest data to object storage.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivenoupload
ThanosReceiveLimitsConfigReloadFailureThanos Receive has not been able to reload the limits configuration.Thanos Receive {{$labels.job}} has not been able to reload the limits configuration.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitsconfigreloadfailure
ThanosReceiveLimitsHighMetaMonitoringQueriesFailureRateThanos Receive has not been able to update the number of head series.Thanos Receive {{$labels.job}} is failing for {{$value humanize}}% of meta monitoring queries.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitshighmetamonitoringqueriesfailurerate
ThanosReceiveTenantLimitedByHeadSeriesA Thanos Receive tenant is limited by head series.Thanos Receive tenant {{$labels.tenant}} is limited by head series.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivetenantlimitedbyheadseries

thanos-rule

NameSummaryDescriptionSeverityRunbook
ThanosRuleQueueIsDroppingAlertsThanos Rule is failing to queue alerts.Thanos Rule {{$labels.instance}} is failing to queue alerts.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosrulequeueisdroppingalerts
ThanosRuleSenderIsFailingAlertsThanos Rule is failing to send alerts to alertmanager.Thanos Rule {{$labels.instance}} is failing to send alerts to alertmanager.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosrulesenderisfailingalerts
ThanosRuleHighRuleEvaluationFailuresThanos Rule is failing to evaluate rules.Thanos Rule {{$labels.instance}} is failing to evaluate rules.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosrulehighruleevaluationfailures
ThanosRuleHighRuleEvaluationWarningsThanos Rule has high number of evaluation warnings.Thanos Rule {{$labels.instance}} has high number of evaluation warnings.infohttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosrulehighruleevaluationwarnings
ThanosRuleRuleEvaluationLatencyHighThanos Rule has high rule evaluation latency.Thanos Rule {{$labels.instance}} has higher evaluation latency than interval for {{$labels.rule_group}}.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosruleruleevaluationlatencyhigh
ThanosRuleGrpcErrorRateThanos Rule is failing to handle grpc requests.Thanos Rule {{$labels.job}} is failing to handle {{$value humanize}}% of requests.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosrulegrpcerrorrate
ThanosRuleConfigReloadFailureThanos Rule has not been able to reload configuration.Thanos Rule {{$labels.job}} has not been able to reload its configuration.infohttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosruleconfigreloadfailure
ThanosRuleQueryHighDNSFailuresThanos Rule is having high number of DNS failures.Thanos Rule {{$labels.job}} has {{$value humanize}}% of failing DNS queries for query endpoints.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosrulequeryhighdnsfailures
ThanosRuleAlertmanagerHighDNSFailuresThanos Rule is having high number of DNS failures.Thanos Rule {{$labels.instance}} has {{$value humanize}}% of failing DNS queries for Alertmanager endpoints.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosrulealertmanagerhighdnsfailures
ThanosRuleNoEvaluationFor10IntervalsThanos Rule has rule groups that did not evaluate for 10 intervals.Thanos Rule {{$labels.job}} has rule groups that did not evaluate for at least 10x of their expected interval.infohttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosrulenoevaluationfor10intervals
ThanosNoRuleEvaluationsThanos Rule did not perform any rule evaluations.Thanos Rule {{$labels.instance}} did not perform any rule evaluations in the past 10 minutes.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosnoruleevaluations

thanos-sidecar

NameSummaryDescriptionSeverityRunbook
ThanosSidecarBucketOperationsFailedThanos Sidecar bucket operations are failingThanos Sidecar {{$labels.instance}} bucket operations are failingcriticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanossidecarbucketoperationsfailed
ThanosSidecarNoConnectionToStartedPrometheusThanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL.Thanos Sidecar {{$labels.instance}} is unhealthy.criticalhttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanossidecarnoconnectiontostartedprometheus

thanos-store

NameSummaryDescriptionSeverityRunbook
ThanosStoreGrpcErrorRateThanos Store is failing to handle gRPC requests.Thanos Store {{$labels.job}} is failing to handle {{$value humanize}}% of requests.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosstoregrpcerrorrate
ThanosStoreSeriesGateLatencyHighThanos Store has high latency for store series gate requests.Thanos Store {{$labels.job}} has a 99th percentile latency of {{$value}} seconds for store series gate requests.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosstoreseriesgatelatencyhigh
ThanosStoreBucketHighOperationFailuresThanos Store Bucket is failing to execute operations.Thanos Store {{$labels.job}} Bucket is failing to execute {{$value humanize}}% of operations.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosstorebuckethighoperationfailures
ThanosStoreObjstoreOperationLatencyHighThanos Store is having high latency for bucket operations.Thanos Store {{$labels.job}} Bucket has a 99th percentile latency of {{$value}} seconds for the bucket operations.warninghttps://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosstoreobjstoreoperationlatencyhigh