src/go/BEST-PRACTICES.md
This guide documents the patterns, conventions, and best practices for writing Go collectors (modules) for Netdata's go.d.plugin framework.
Each collector MUST have a directory structure like this:
collector/mymodule/
├── README.md # Developer documentation about the plugin
├── collector.go # Main collector implementation
├── collect.go # Collection logic (optional)
├── charts.go # Chart definitions
├── config_schema.json # Configuration schema
├── metadata.yaml # Netdata marketplace metadata (includes user documentation about the plugin)
├── init.go # Init helpers (optional)
└── testdata/ # Test fixtures
Each collector MUST have a stock configuration file in go.d/config or ibm.d/config following a simple pattern:
## All available configuration options, their descriptions and default values:
## https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/collector/mymodule#configuration
#jobs:
# - name: local
# url: http://localhost:8080
#
# - name: remote
# url: http://203.0.113.0
# username: username
# password: password
For more detailed documentation, create self-documenting configs in custom plugins:
## netdata configuration for MyModule
## This collector monitors...
## Prerequisites:
## 1. First requirement
## 2. Second requirement
jobs:
## Example: Basic configuration
# - name: local
# url: http://localhost:8080
# update_every: 5
## Example: With authentication
# - name: prod
# url: https://prod.example.com
# username: monitor
# password: secret
Collectors must NEVER filter the monitored items based on activity or volume. Netdata expects a stable number of metrics. Of course charts and dimensions can come and go, but it is important this NOT to be too frequent for a large number of metrics.
The best practice is to have 2 configuration parameters:
maxXXX, as a numberselectXXX as a simpler pattern (glob)The combination of the two works like this: "If the number of items is less than {maxXXX} chart them all, otherwise chart only the ones matched by {selectXXX}".
This ensures that small deployments are monitored in full, while bigger deployments require configuration in order to enable monitoring this kind of items.
The default maxXXX MUST be 100-500.
The default selectXXX should be unset or empty to match nothing.
Collectors MUST obsolete charts when they are no longer collected.
IMPORTANT: Obsoletion flushes Netdata memory. Unecessary obsolation and recreation leads to increased storage footprint for the metrics, because TSTB flushes them prematurely before having enough samples for Gorilla compression + ZSTD to efficiently compress them.
Creating charts and obsoleting them in a flip-flot fashion is a bad practice.
If there is the risk for charts to flip-flop active-obsolete, the best practice is to obsolete them after 1 minute of absence.
The go.d framework provides a built-in mechanism for marking charts as obsolete when they are no longer needed. This is available for ALL charts - both static (base) charts and dynamic instance charts.
Marking a Chart as Obsolete:
// Method available on any chart
func (c *Chart) MarkRemove() {
c.Obsolete = true // Adds "obsolete" to CHART command options
c.remove = true // Flags for removal from charts slice
}
Framework Behavior:
CHART 'module.metric' '' 'Title' 'units' 'family' 'context' 'line' '100' '1' 'obsolete' 'plugin' 'module'No Dimension Values Sent:
For collectors that support multiple versions or editions of the monitored application, follow this critical principle:
Admin configuration MUST always take precedence over auto-detection.
Netdata's Go collectors MUST send integer values to Netdata, but many metrics are naturally floating-point (percentages, response times, load averages, etc.). Netdata uses a precision system to handle this conversion while preserving decimal places in the database.
How It Works:
precision = 1000Mul) and precision (Div) are independentCRITICAL: Never fill gaps. For Netdata, the absence of data collection samples is crucial. Netdata visualizes gaps in data collection, marking missing points as empty. This happens automatically when collectors DO NOT SEND samples at the predefined collection interval. The Golden Rule: When data collection fails, DO NOT SEND ANY VALUE. Skip the metric entirely.
The ONLY exception is for metrics derived from events (logs, message queues, etc.) where sparse data is expected.
When possible Connect ONCE and maintain the connection for the collector's lifetime.
When reconnection is necessary:
CRITICAL: Reuse temporary objects between collection cycles. Don't create new temporary resources for each collection.
# Module name collector
## Overview
What this collector does and what it monitors.
## Collected metrics
List of all metrics with descriptions.
## Configuration
Configuration examples and options.
## Requirements
Any prerequisites or dependencies.
## Troubleshooting
Common issues and solutions.
metadata.yaml drives the integrations list presented in Netdata dashboard and sites
plugin_name: go.d.plugin
modules:
- meta:
module_name: mymodule
monitored_instance:
name: My Service
link: https://example.com
categories:
- data-collection.category
icon_filename: "icon.svg"
overview:
data_collection:
metrics_description: |
Detailed description of what metrics are collected.
method_description: |
How the collector gathers these metrics.
setup:
prerequisites:
list:
- title: Requirement
description: What needs to be done
metrics:
folding:
title: Metrics
enabled: false
description: ""
availability: []
scopes:
- name: global
description: These metrics refer to the entire instance.
labels: []
metrics:
- name: module.metric_name
description: Metric description
unit: units
chart_type: line
dimensions:
- name: dimension1
- name: dimension2
Collectors MUST support vnode for configuration management:
type Config struct {
Vnode string `yaml:"vnode,omitempty" json:"vnode"`
UpdateEvery int `yaml:"update_every,omitempty" json:"update_every"`
// ... other fields
}
Alerts should be stores in src/health/health.d/{module}.conf
Netdata uses alert templates that are automatically applied to a single instance (chart in the code).
Netdata follows specific patterns for alert thresholds:
Avoid fixed thresholds except for:
Preferred approaches:
Note: The dynamic threshold rule primarily applies when we cannot express the metric as a percentage. For percentage-based metrics (0-100%), fixed thresholds are appropriate and preferred for their simplicity and clarity.
Silent alerts: Use to: silent for alerts that don't require immediate human action
Non-silent alerts: Only for issues requiring immediate action
template: collector_metric_condition
on: collector.metric
class: Utilization|Errors|Latency|Workload|Availability
type: System|Database|Web Server|Application
component: ServiceName
lookup: average -5m unaligned of dimension # for time-based, the result in $this
calc: $dimension * 100 / $total # for calculated metrics, can use $this, the result in $this
units: %|ms|requests|errors
every: 10s
warn: # warning condition, can use $this
crit: # critical condition, can use $this
delay: down 5m multiplier 1.5 max 1h
summary: Short description
info: Detailed description with ${value} placeholder
to: role|silent
The collectors MUST always (when possible) add labels for the versions of the application. This enables users to filter by version, group metrics by version, and understand which features are available.
thread_pools.threads clearly indicates thread countsNetdata's distributed architecture enables ingestion of SIGNIFICANTLY more metrics than centralized systems. Collect EVERYTHING unless there's a compelling reason not to.
DEFAULT: COLLECT EVERYTHING
Only exclude metrics when:
Review the code for any mock data, remaining TODOs, frictional logic, frictional API endpoints, frictional response formats, frictional members, etc. An independent reality check MUST be performed.
Review all logs generated by the code to ensure users a) have enough information to understand the selections the code made, b) have descriptive and detailed logs on failures, c) they are not spammed by repeated logs on failures that are supposed to be permanent (like a not supported feature by the monitored application)
The code, the stock configuration, the metadata.yaml, the config_schema.json and any stock alerts MUST match 100%. Even the slightest variation between them in config keys, possible values, enum values, contexts, dimension names, etc LEADS TO A NON WORKING SOLUTION. So, at the end of every change, an independent sync check MUST be performed. While doing this work, ensure also README.md reflects the facts.
metadata.yaml is the PRIMARY documentation source shown on the Netdata integrations page - ensure troubleshooting information, setup instructions, and all documentation is up-to-date in BOTH README.md AND metadata.yaml.
The file /docs/NIDL-Framework.md describes how Netdata metrcis, charts and dashboards work. You MUST read it before working with metrics in collectors. IMPORTANT: "chart" for go.d modules is "instance" for NIDL and "context" for go.d modules in "chart" for NIDL.
config_schema.json, so that dynamic configuration of the collector will work on the dashboardsmetadata.yaml, so that users can see themmetadata.yaml, so that users can see themREADME.md, so that developers and users can see them