docs/developers/REVIEWS.md
Pull-requests require two approvals before being merged. Expect several rounds of back and forth on reviews, non-trivial changes are rarely accepted on the first pass. It might take some time until you see a first review so please be patient.
All pull requests should follow the style and best practices in the CONTRIBUTING.md document.
The review process is roughly structured as follows:
ready for final review tag.
Please constructively work with the reviewer to get your code into a
mergeable state (see also below).Please read the review comments carefully, fix the related part of the code
and/or respond in case there is anything unclear. Maintainers will add the
waiting for response tag to PRs to make it clear we are waiting on the
submitter for updates.
Once the tag is added, if there is no activity on a pull request or the
contributor does not respond, our bot will automatically close the PR after two
weeks! If you expect a longer period of inactivity or you want to abandon a
pull request, please let us know.
In case you still want to continue with the PR, feel free to reopen it.
toml:"command" (snake_case) Log telegraf.Logger `toml:"-"`
(in tests, you can do myPlugin.Log = testutil.Logger{})
Init() error
function, not in the Connect, Gather, or Start functions.Init() error should not contain connections to external services. If
anything fails in Init, Telegraf will consider it a configuration error and
refuse to start.Init() error and reused, (or better
yet, on the package if there's no client-specific configuration).
http.Client has built-in concurrency protection and reuses connections
transparently when possible.plugin.Log.Debugf() only shows up when running Telegraf with --debugoutputs.influxdb and outputs.influxdb_v2)Each pull request will have the appropriate linters checking the files for any common mistakes. The github action Super Linter is used: super-linter. If it is failing you can click on the action and read the logs to figure out the issue. You can also run the github action locally by following these instructions: run-linter-locally.md. You can find more information on each of the linters in the super linter readme.
Sufficient unit tests must be created. New plugins must always contain some unit tests. Bug fixes and enhancements should include new tests, but they can be allowed if the reviewer thinks it would not be worth the effort.
Table Driven Tests are encouraged to reduce boiler plate in unit tests.
The stretchr/testify library should be used for assertions within the tests when possible, with preference towards github.com/stretchr/testify/require.
Primarily use the require package to avoid cascading errors:
assert.Equal(t, lhs, rhs) # avoid
require.Equal(t, lhs, rhs) # good
The config file is the primary interface and should be carefully scrutinized.
Ensure the [[SampleConfig]] and README match with the current standards.
READMEs should:
# for comments# for defaults, which should always match the default value of the pluginTelegraf metrics are heavily based on InfluxDB points, but have some extensions to support other outputs and metadata.
New metrics must follow the recommended schema design. Each metric should be evaluated for series cardinality, proper use of tags vs fields, and should use existing patterns for encoding metrics.
Metrics use snake_case naming style.
Generally enumeration data should be encoded as a tag. In some cases it may be desirable to also include the data as an integer field:
net_response,result=success result_code=0i
Use tags for each range with the le tag, and +Inf for the values out of
range. This format is inspired by the Prometheus project:
cpu,le=0.0 usage_idle_bucket=0i 1486998330000000000
cpu,le=50.0 usage_idle_bucket=2i 1486998330000000000
cpu,le=100.0 usage_idle_bucket=2i 1486998330000000000
cpu,le=+Inf usage_idle_bucket=2i 1486998330000000000
Lists are tricky, but the general technique is to encode using a tag, creating one series be item in the list.
Counters retrieved from other projects often are in one of two styles, monotonically increasing without reset and reset on each interval. No attempt should be made to switch between these two styles but if given the option it is preferred to use the non-resetting variant. This style is more resilient in the face of downtime and does not contain a fixed time element.
When metrics are gathered from another host, the metric schema should have a tag named "source" that contains the other host's name. See this feature request for details.
The metric schema doesn't need to have a tag for the host running telegraf. Telegraf agent code can add a tag named "host" and by default containing the hostname reported by the kernel. This can be configured through the "hostname" and "omit_hostname" agent settings.
In general code should follow best practice describe in Code Review Comments.
All network operations should have appropriate timeouts. The ability to cancel the option, preferably using a context, is desirable but not always worth the implementation complexity.
Channels should be used in judiciously as they often complicate the design and can easily be used improperly. Only use them when they are needed.