rfd/0154-logging-guidelines.md
@zmb3Guidelines for logging within Teleport components.
At the time of writing this, Teleport uses logrus for structured logging. This
library is in maintenance mode, and while not completely dead, updates are limited to security vulnerabilities,
(backwards compatible) bug fixes, and performance (where we are limited by the interface). As of Go 1.21, the standard
library now includes its own structured logger: log/slog. While there are other logging
libraries that have tried to replace logrus none of them have ever gained consensus and widespread adoption within the
community. log/slog may not be perfect, but the fact that it's included in the standard library and was designed to be
far more flexible than the log package, which led to libraries like logrus in the first place, makes it a preferred
choice.
All logging performed in the api module is also done via logurs. Not only does the dependency inflate the size of
the api module, it also forces any consumers of the package to instantiate a default logurs logger to see any
output. This forces consumers to also import logrus even though they may prefer to use a different logging library. By
usinglog/slog instead we can allow consumers of the api module to use whatever slog.Handler they would like to
output any logs from api.
Converting logging libraries is not going to be a quick and easy process, but it also does not have to be done in a
single change. Both logrus and log/slog can coexist within Teleport until all uses of logrus can be migrated. The
most important thing is for consumers of the logs not to be able to tell which library produced the output. This means
that we cannot use the builtin slog.TextHandler and slog.JSONHandler and instead must write custom handlers that
match the output from our
logrus.TextFormatter
implementations.
Once we can produce output from log/slog that matches logrus we can begin the following migration process.
Initialize a slog and logurs logger in both the
global logging initialization
and the
servicecfg logging initialization.
Since both loggers would be writing to the same output (os.Stdout, os.Stderr) we need to ensure that messages don't
clobber each other. To prevent this a simple io.Writer wrapper that guards calls to write with a mutex and then writes
to the configured output can be used.
Once a slog.Logger may be retrieved either globally via slog.Default/slog.With or by dependency injection, the
process of converting individual loggers may begin. Since TeleportProcess is the entry point and object responsible
for creating other components it should be the first component to be converted to have and use a slog.Logger. Once it
has been converted, objects it creates can be migrated independently until they've all been converted. The apimodule
consumes the global logrus logger, so it may be converted independently after the global slog.Logger is configured
in the first step. However, the minimum supported go version of the api module will need to be bumped to >= go 1.21
before package log/slog is available to be consumed there. When the last consumers of the logrus library have been
removed from the api module go mod tidy to get rid of the logrus dependency (or at least make it an indirect
dependency). Once slog is the only library writing output from within teleport the log initialization logic can be
cleaned up to only create a slog.Logger and remove the io.Writer wrapper from the first step. At this
point go mod tidy should be run to clean up the dependency. A depguard rule should also be put in place to prevent any
accidental uses of logrus after the conversion.
There are a few subtle API differences between the two logging libraries which prevent migrating to be a find and
replace operation. The following sections demonstrate these differences and offer various recommendations to overcome
them when migrating to log/slog.
logrus provides an API which allows for fmt.Sprintf style formatting (Tracef,Debugf,Infof, etc.) of log messages
which are relied on heavily throughout the codebase.
gologrus.Infof("Connected to cluster %v at %v.", clusterName, conn.RemoteAddr())bash2023-11-02T15:18:40-04:00 [PROXY] INFO Connected to cluster foo at 192.168.1.243. pid:14968.1 regular/proxy.go:250
The log/slog API differs in that all logging functions provided take a message and a series of key value pairs,
referred to as attributes, instead of a message to format and associated arguments. Any attributes provided will be
included as fields alongside the message. While a literal translation of the above call to Infof is possible in slog
by using fmt.Sprintf directly it is not recommended. Instead, when converting to log/slog structured logging should
be embraced. The message from above can be translated to carry a bit more context, and the additional fields in the
message can be moved to the attributes.
goslog.InfoContext(ctx, "Proxied connection to cluster established", "cluster_name", clusterName, "remote_address", conn.RemoteAddr())bash2023-11-02T15:18:40-04:00 [PROXY] INFO Proxied connection to cluster established pid:14968.1 cluster_name:foo remote_address:192.168.1.243 regular/proxy.go:250
To add structured fields with logrus either the WithField or WithFields methods were used, the only difference
being the latter took a map of fields instead of a single field.
logrus.WithField("user", user).Warn("Failed to emit account recovery code used failed event")
log/slog provides similar functionality via the With method, though the API differs slightly. Just like the logging
API, With takes a series of key value pairs instead of a struct like logrus. The above example can naively be
converted to the following with log/slog.
slog.With("user", user).WarnContext(ctx, "Failed to emit account recovery code used failed event")
However, instead of creating a clone of the logger with the user field added, the desired attributes should be
passed directly to the Warn method instead.
slog.WarnContext(ctx, "Failed to emit account recovery code used failed event", "user", user)
The main benefit to using slog.With is that any attributes provided to it are formatted once instead of each time a
message is logged. slog.With should only be used when enriching a logger with additional information that is used to
log several messages, be that in a function or a logger that is a struct member.
logrusgologrusLogger := logrus.WithFields(logrus.Fields{ "device_id": ref.DeviceID, "os_type": ref.OSType, "asset_tag": assetTag, }) ... logrusLogger.Debug("some message") ... logrusLogger.Warn("some other message")
log/sloggoslogLogger := slog.With("device_id", ref.DeviceID, "os_type", ref.OSType, "asset_tag", assetTag) ... slogLogger.DebugContext(ctx, "some message") ... slogLogger.WarnContext(ctx, "some other message")
Unlike logrus, log/slog does not provide any special APIs for dealing with errors. When converting to log/slog any
usages of WithError should follow the same advice from the WithField(s) section. If the error is only included in a
single log message, then it should be added as an attribute to the log function. If the error is included in multiple
messages then a logger should be created via With("error", err).
logrusgologrus.WithError(err).Error("Error parsing response.")
log/sloggoslog.ErrorContext(ctx, "Error parsing response", "error", err)
If many log lines share a common attribute, use a slog.With to construct a slog.Logger with that attribute. This
allows it to only be formatted once instead of every time a log is emitted.
If an attribute is only emitted in a single log entry, prefer passing it as an attribute instead of
calling slog.With.
Arguments to a logging function are always evaluated, even if the log event is discarded due to configured verbosity.
If the object implements fmt.Stringer you can pass it by pointer and the String method will only be called when
the log is being written. Another option is to implement
the slog.LogValuer interface.
For high volume log messages prefer to use slog.Logger.LogAttrs. While the API is more verbose, it avoids any
allocations, making it the most efficient way to produce output. The example below illustrates how to convert a
message using InfoContext to LogAttrs.
slog.InfoContext(ctx, "Speed threshold reached", "flux_capacitors_charged", true, "operator", "doc brown")
slog.LogAttrs(ctx, slog.LevelInfo, "Speed threshold reached", slog.Bool("flux_capacitors_charged", true), slog.String("operator", "doc brown"))
Package global loggers should not use slog.With to create the logger. Doing so results in incorrect formatting
because the default slog handler is used instead of the slog handler set at runtime (see
https://github.com/gravitational/teleport/issues/40629 for more details). Instead a global logger should be
created with logutils.NewPackageLogger which allows formatting of attributes to be deferred until runtime
so that the correct formatter is used.
var logger = logutils.NewPackageLogger(teleport.ComponentKey, "llama")
There currently exists no logging standards within Teleport, which has resulted in various different patterns depending
on which file you open. If we are going to take the time to adjust all existing log messages while converting
to log/slog, we should do our best to start following a few simple rules.
fmt.Sprintf style
formatting. Enrich log messages with additional attributes to provide additional context.slog.Logger API. Providing the context.Context to all log messages allows them
to be enriched with additional attributes. For example, this allows log messages to automatically have a request or
trace id added, which makes correlating messages that were emitted as a result of one particular action or event.
If no context is readily available for the logger to consume, prefer to use a context.TODO if it is possible to get
a context to the call site in the future but refactoring is to do be done in a separate unit of work. If there is
likely to never be a context provided to the function producing the log output, then prefer to use
a context.Background to indicate as such.To achieve these, we can add enable the sloglint linter in our golangci-lint
configuration.
linters:
enable:
- sloglint
linters-settings:
sloglint:
static-msg: true
key-naming-case: snake
context-only: true
Any CVEs identified with log/slog will be addressed by the Go team in a timely manner. The same cannot be said
for logrus, at the time of this writing there has been an issue open for more than a week reporting that it is
vulnerable to CVE-2022-28948 without so much as a reply.
It is not recommended to backport the changes suggested in this RFD. Only branch/v14 and newer are built with Go 1.21,
which means it is the only branch even capable of using log/slog, we could use x/exp/slog for the other release
branches. If this causes an increased number of conflicts when backporting changes to release branches, we can
reconsider backporting the efforts required to change logging libraries.
Provide the log/slog API with a wrapped type that used logrus under the hood.
type slogrus struct {
wrapped logrus.FieldLogger
}
// InfoContext here matches slog's InfoContext signature
func (s *slogrus) InfoContext(ctx context.Context, msg string, args ...any) {
s.wrapped.WithFields(convertToFields(args)).Info(msg)
}
We would slowly migrate all logrus callsites to use this new wrapper, and then, upon completion, swap out the wrapper
for log/slog.
Positives:
Negatives:
Write a slog.Handler that uses a logrus.Logger to format and produce output.
type slogrus struct {
wrapped logrus.FieldLogger
}
func (s *slogrus) WithAttrs(attrs []Attr) Handler {
return s.wrapped.WithFields(convertToFields(args))
}
func (s *slogrus) Handle(_ context.Context, r Record) error {
s.wrapped.WithFields(convertRecordToFields(r)).Log(convertLevel(r.Level), r.Message)
return nil
}
Positives:
Negatives: