service/imagemonitor/README.md
A real-time Kubernetes image pull monitoring service for Sealos, designed to track and analyze container image pulling issues with Prometheus metrics integration.
Image Monitor Service acts as a monitoring layer for Kubernetes cluster image pull operations, providing:
┌─────────────────┐
│ Kubernetes │
│ API Server │
└────────┬────────┘
│ Watch Pods
▼
┌──────────────────┐ ┌──────────────────┐
│ Image Monitor │────▶│ Prometheus │
│ Service │ │ Metrics │
└──────────────────┘ └──────────────────┘
│
▼
┌──────────────┐
│ Failure │
│ Analyzer │
│ - Network │
│ - Auth │
│ - Not Found │
│ - Slow Pull │
└──────────────┘
analyzer.go): Analyzes container status and identifies image pull errorsclassifier.go): Categorizes failures into specific types using regex pattern matchingslow_pull.go): Monitors pulling duration and triggers alerts for slow operationsmetrics.go): Exposes Prometheus metrics at :8080/metricsComprehensive Failure Classification: Automatically identifies and categorizes:
Slow Pull Detection:
Smart State Management:
Public Registry Focus: Monitors only public registry images to avoid exposing private infrastructure
The service exposes two main Prometheus metrics:
image_pull_failureTracks active image pull failures with detailed labels.
Type: Gauge Labels:
namespace: Pod namespacepod: Pod namenode: Node where pull is failingregistry: Container registry (e.g., docker.io, ghcr.io)image: Full image referencereason: Classified failure reasonExample:
image_pull_failure{
namespace="default",
pod="my-app-xyz",
node="node-1",
registry="docker.io",
image="nginx:latest",
reason="image_not_found"
} 1
image_pull_slow_alertTracks slow image pull operations (>3 minutes).
Type: Gauge Labels:
namespace: Pod namespacepod: Pod namecontainer: Container nameimage: Full image referenceExample:
image_pull_slow_alert{
namespace="default",
pod="my-app-xyz",
container="app",
image="docker.io/myimage:v1.0"
} 1