docs/fleet-configuration-management.md
As infrastructures grow from a handful of servers to thousands of nodes across mixed environments (Linux, Kubernetes, Windows, macOS, FreeBSD), managing observability agents becomes one of the most painful operational tasks.
Without a coherent strategy, teams face:
Netdata's Fleet Management Philosophy:
| Capability | Linux | Kubernetes | FreeBSD | macOS | Windows |
|---|---|---|---|---|---|
| Auto-deploy | ✅ kickstart.sh¹ | ✅ Helm | ✅ kickstart.sh | ✅ kickstart.sh | ✅ MSI² (silent) |
| Auto-update | ✅ Built-in | ✅ Built-in | ✅ Built-in | ✅ Built-in | ⚠️ Manual³ |
| Auto-discover system metrics | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| Auto-discover all processes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| Auto-discover containers & VMs | ✅ Yes | ✅ Yes | ❌ No | ❌ No | ✅ Hyper-V |
| Auto-discover Docker apps | ✅ Yes | ✅ Via k8s⁴ | ✅ If Docker | ✅ If Docker | ✅ If Docker |
| Auto-discover system services | ✅ systemd | ✅ Yes | ⚠️ Limited | ⚠️ launchd | ✅ Windows Services |
| Auto-discover enterprise apps | ✅ netlistensd⁵ | ✅ Via k8s | ❌ Manual | ❌ Manual | ✅ perflib⁶ |
| Infrastructure as Code | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| Dynamic Configuration | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
Legend: ✅ Full support | ⚠️ Partial support | ❌ Not available
Footnotes:
The observability industry uses two primary approaches for configuration management:
IaC treats configurations as code artifacts that can be versioned, reviewed, and deployed through automated processes.
Common tools:
Typical workflow:
Dynamic configuration uses a central control plane to manage configurations without requiring code deployments.
Common implementations:
Typical workflow:
| Approach | When to Use | How |
|---|---|---|
| IaC Only | Compliance requirements, audit trails | Ansible, Terraform, Puppet, Chef |
| Dynamic Only | Small teams, rapid iteration | Netdata Cloud UI or REST API |
| Hybrid (Best) | Most organizations | Base config in Git, credentials/thresholds via UI |
Netdata addresses fleet deployment through a comprehensive strategy that minimizes operational overhead while maximizing flexibility:
Netdata provides a unified installation experience across all platforms (except Windows) through kickstart.sh, which implements an intelligent cascade:
Installation priority order on Linux:
Platform-specific behavior:
kickstart.shAll Netdata installations (except Windows) auto-update to the latest version:
Strong backwards compatibility: Netdata ensures upgrades don't break existing configurations and data - Netdata maintains compatibility across versions allowing seamless updates (see Netdata Infrastructure for architectural details)
Netdata supports both IaC and Dynamic Configuration while minimizing the need for manual configuration through extensive auto-discovery capabilities.
Netdata's primary operational approach is to eliminate manual configuration through comprehensive auto-discovery:
For organizations with established DevOps practices, Netdata fully supports configuration management through traditional IaC tools like Ansible, allowing version-controlled, auditable deployments.
Through Netdata Cloud and the Dynamic Configuration Manager and authenticated Netdata Agent and Parent dashboards (the user must sign-in to the dashboard), users can manage collector configurations and alert rules across their entire fleet without touching configuration files, restarting or redeploying agents.
When multiple configuration sources exist for the same component, Netdata applies them in the following priority order (highest to lowest):
/etc/netdata or /opt/netdata/etc/netdataNetdata auto-detects operating system metrics (compute, memory, networking stack, storage, etc) on all platforms.
On Linux, Netdata autodetect all kernel modules and technologies which have been instrumented, including firewalls, DDoS protections systems, storage technologies and filesystems, etc. Usually all these technologies require zero configuration.
Similarly for Windows, Netdata will autodetect everything exposed via Perflib.
The apps.plugin provides intelligent process tree aggregation and monitoring on all platforms (Linux, FreeBSD, macOS, Windows):
Intelligent Process Tree Aggregation:
Resource Monitoring (per application group):
Key Benefits:
This provides comprehensive application monitoring even for software without specific collectors, making it an essential first line of observability.
The go.d.plugin provides auto-discovery for 150+ applications through multiple mechanisms:
Service discovery mechanisms:
local-listeners binary to detect listening servicesPlatform-specific behavior:
Linux systems (non-Kubernetes):
Full auto-discovery is available through the local-listeners utility which:
Non-Linux platforms (FreeBSD, macOS, Windows):
Discovery status management: When services are discovered but cannot be monitored, the Dynamic Configuration (DynCfg) system tracks their status:
Users can view discovered services in the Netdata Cloud UI and supply missing credentials or configuration parameters through the interface, allowing the collectors to retry connection without manual file editing and without restarting Netdata.
Supported applications include databases (MySQL, PostgreSQL, Redis, MongoDB), web servers (NGINX, Apache, HAProxy), message queues (RabbitMQ, Kafka), SNMP, and many more.
Installation: Windows requires the MSI installer instead of kickstart.sh:
# Silent installation for fleet deployment
msiexec /i netdata-installer.msi /qn /norestart `
CLAIMING_TOKEN="YOUR_TOKEN" `
CLAIMING_ROOMS="YOUR_ROOM_ID" `
CLAIMING_URL="https://app.netdata.cloud"
# Via Group Policy or SCCM
# Deploy MSI with TRANSFORMS for site-specific configuration
Update Management (Manual - auto-updates coming):
Windows-Specific Auto-Discovery: Windows monitoring is handled by the native windows.plugin which uses Windows Performance Counters (perflib) to automatically discover and monitor:
Enterprise Applications (auto-discovered via perflib):
System Monitoring (automatically enabled):
Process Monitoring:
When Netdata runs inside a Kubernetes cluster, it provides comprehensive multi-level discovery:
Cluster Monitoring:
Service Discovery (k8ssd):
Discovery Behavior in Kubernetes:
Most Kubernetes deployments require no configuration beyond the initial Helm chart installation.
Understanding how different platforms handle configuration helps in planning migrations or hybrid deployments:
| Platform | Components per Node | Configuration Method | Auto-Discovery | Updates & Restarts |
|---|---|---|---|---|
| Netdata | Single agent | Files, Ansible, or Dynamic UI | Extensive | Zero-downtime for Dynamic Config |
| Prometheus | 5-20 exporters | YAML files per exporter | Limited | Rolling restarts |
| Datadog | Agent + integrations | YAML files | Moderate | Agent restart |
| OpenTelemetry | Collectors + instrumentation | YAML files per collector | Limited | Full restart |