service/zombiedetector/README.md
A Kubernetes node monitoring tool that detects "zombie" nodes - nodes in Ready status but unable to retrieve resource metrics, and sends alerts via Feishu bot.
| Variable | Description | Default | Example |
|---|---|---|---|
CLUSTER_NAME | Cluster name | Optional | default |
FEISHU_WEBHOOK_URL | Feishu bot Webhook URL | Required | https://open.feishu.cn/open-apis/bot/v2/hook/xxx |
CHECK_INTERVAL | Interval for checking node status | 30s | 1m, 30s |
ALERT_THRESHOLD | Time before triggering alert when no metrics | 1m | 3m, 180s |
ALERT_INTERVAL | Minimum interval for repeat alerts on same node | 5m | 10m, 600s |
POD_NAME | Pod name for leader election | Optional | Set by Kubernetes |
POD_NAMESPACE | Pod namespace for leader election | Optional | Set by Kubernetes |
When a zombie node is detected, you will receive a Feishu message like:
ā ļø Node Monitor Alert
Cluster Name: production
Node Name: worker-node-1
Description: Node is Ready, but no metrics data for 5m30s
Alert Time: 2024-01-15 10:30:45
kubectl apply -f deployment.yaml
Make sure to set the following environment variables:
env:
- name: FEISHU_WEBHOOK_URL
value: "https://open.feishu.cn/open-apis/bot/v2/hook/your-webhook-url"
- name: CLUSTER_NAME
value: "production"
- name: CHECK_INTERVAL
value: "30s"
- name: ALERT_THRESHOLD
value: "3m"
- name: ALERT_INTERVAL
value: "10m"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
The service exposes health check endpoints on port 8080:
/healthz - Liveness probe/readyz - Readiness probeWhen POD_NAME and POD_NAMESPACE are set, the service will run with leader election enabled. Only one pod will actively monitor nodes at a time, ensuring high availability without duplicate alerts.
The codebase is organized into multiple files:
main.go - Entry point and initializationmonitor.go - Core monitoring logicalert.go - Feishu alert functionalityleader_election.go - Leader election logichealth.go - Health check servertypes.go - Type definitions and constantsutils.go - Utility functionsThis project is licensed under the Apache License 2.0.