docs/style.md
Our style is derived from the
Uber Go Style Guide.
This document covers CockroachDB-specific conventions only; readers are expected
to know standard Go and the Uber guide. Use crlfmt -w -tab 2 <file>.go (not
gofmt) to format code: 100-column code lines, 80-column comments, tab width 2.
We use crlfmt's rules for wrapping long function signatures.
While it is fairly perscriptive, it does not add or remove repeated identical
argument types, e.g. start int, end int vs start, end int; the latter form
with the omitted specifier should only be used on a single line no argument
should appear by itself on a line with no type (confusing and brittle to edits).
iota + 1 unless the zero value is meaningful.if err := f(); err != nil where possible.sync.Mutex directly in private types; use a named mu
field for exported types.var declaration (not []int{}). nil is a valid
empty slice. Check emptiness with len(s) == 0, not s == nil.defer for releasing locks, closing files, and similar
cleanup.&T{} instead of
new(T).strconv over fmt for primitive conversions.
Avoid repeated []byte("...") conversions in loops.Option interface pattern at API boundaries;
use option structs for internal functions.For bool literals passed to functions, and in some cases other simple literals
like nil, 0, or 1, it can be helpful to add a C-style comment with the
parameter name, e.g. printInfo(ctx, txn, 1 /* depth */, false /* verbose */).
When doing so the comment always uses the parameter name verbatim -- do not
negate it and do not substitute a different word. Another good option in such
cases is to define a named const, e.g. const verbose, notVerbose = true, false
to pass, or even have the function take a typed arg e.g. type Verbosity bool.
From roachpb/metadata.proto -- the enum-level comment provides a birds-eye
view of joint configs; each value explains its role in replication changes.
// ReplicaType identifies which raft activities a replica participates in. In
// normal operation, VOTER_FULL and LEARNER are the only used states. However,
// atomic replication changes require a transition through a "joint config"; in
// this joint config, the VOTER_DEMOTING and VOTER_INCOMING types are used as
// well to denote voters which are being downgraded to learners and newly added
// by the change, respectively.
//
// All voter types indicate a replica that participates in all raft activities,
// including voting for leadership and committing entries. In a joint config, two
// separate majorities are required: one from the set of replicas that have
// either type VOTER or VOTER_OUTGOING or VOTER_DEMOTING, as well as that of the
// set of types VOTER and VOTER_INCOMING.
enum ReplicaType {
VOTER_FULL = 0;
VOTER_INCOMING = 2;
VOTER_OUTGOING = 3;
}
From replica_command.go -- comments explain lease semantics and range-merge
edge cases, not just control flow.
func (r *Replica) executeAdminCommandWithDescriptor(
ctx context.Context, updateDesc func(*roachpb.RangeDescriptor) error,
) *roachpb.Error {
// Retry forever as long as we see errors we know will resolve.
retryOpts := base.DefaultRetryOptions()
// Randomize quite a lot just in case someone else also interferes with us
// in a retry loop. Note that this is speculative; there wasn't an incident
// that suggested this.
retryOpts.RandomizationFactor = 0.5
var lastErr error
for retryable := retry.StartWithCtx(ctx, retryOpts); retryable.Next(); {
// The replica may have been destroyed since the start of the retry loop.
// We need to explicitly check this condition. Having a valid lease, as we
// verify below, does not imply that the range still exists: even after a
// range has been merged into its left-hand neighbor, its final lease
// (i.e., the lease we have in r.mu.state.Lease) can remain valid
// indefinitely.
if _, err := r.IsDestroyed(); err != nil {
return roachpb.NewError(err)
}
// Admin commands always require the range lease to begin (see
// executeAdminBatch), but we may have lost it while in this retry loop.
// Without the lease, a replica's local descriptor can be arbitrarily
// stale, which will result in a ConditionFailedError. To avoid this, we
// make sure that we still have the lease before each attempt.
if _, pErr := r.redirectOnOrAcquireLease(ctx); pErr != nil {
return pErr
}
Order by receiver, then rough call order; constructors after type definition.
goimports may auto-import the wrong package.server/serverpb,
kv/kvserver, util/contextutil, util/testutil.xxxxpb sub-directory of the parent package
(e.g. pkg/server/serverpb, pkg/sql/sessiondatapb).Interface extraction: Extract an interface into a leaf sub-package that both sides can import.
// Package bapi defines the interface.
package bapi
type B interface { Bar() }
// Package a imports only the interface.
package a
import "b/bapi"
func Foo(b bapi.B) { b.Bar() }
// Package b imports a and implements the interface.
package b
import "a"
type bImpl struct{}
func (x bImpl) Bar() { a.Foo(x) }
Ensure package a has a mock implementation for its own unit tests.
Dependency injection: Use when the interface approach is impractical.
a: var BarFn = func() { /* placeholder */ }b: func init() { a.BarFn = Bar }The package exposing the injection slot must pass its unit tests independently, with valid default behavior when the slot is not populated.
X_test package trick: Test files in package X can declare
package X_test, which is allowed to import packages that transitively depend
on X. For example, testserver depends on pkg/server which depends on
pkg/sql, so tests needing a test server use package sql_test or
package server_test.
We use the datadriven package as an alternative to table-driven tests. One of the most used implementations is "logic tests" which execute SQL and compare results to golden files.
A well-written test should observe and assert only behavior directly relevant
to the functionality under test. Overly broad assertions (e.g. selecting all
system tables instead of only tables the test created) make tests brittle and
train engineers to blindly update them, defeating their purpose. A logic test
should choose its SELECT and WHERE clause so that the output is fully
determined by the test's own constructed conditions -- not by unrelated factors
like total system table count, OS version, table IDs, or range IDs.
This is particularly important to consider intentionally when working with tests
that have a --rewrite option as this can easily capture more output in what is
being tested than intended.
We use cockroachdb/errors, a superset
of Go's standard errors package that integrates with cockroachdb/redact.
errors.New("connection failed") // static string
errors.Newf("invalid value: %d", val) // formatted
errors.AssertionFailedf("expected non-nil pointer") // assertion failure
errors.Wrap(err, "opening file") // add context
errors.Wrapf(err, "connecting to %s", addr) // formatted context
Keep context succinct; avoid "failed to" which piles up:
// Bad: "failed to x: failed to y: failed to create store: the error"
return errors.Newf("failed to create new store: %s", err)
// Good: "x: y: new store: the error"
return errors.Wrap(err, "new store")
errors.WithHint(err, "check your network connection")
errors.WithDetail(err, "request payload was 2.5MB")
errors.WithSafeDetails(err, "node_id=%d", nodeID) // safe for Sentry reports
if errors.Is(err, ErrNotFound) { /* ... */ } // sentinel errors
var nfErr *NotFoundError
if errors.As(err, &nfErr) { /* ... */ } // typed errors
| Scenario | Approach |
|---|---|
| No additional context needed | Return original error |
| Adding context | errors.Wrap or errors.Wrapf |
| Passing through goroutine channel | errors.WithStack on both ends |
| Callers don't need to detect this error | errors.Newf |
| Hide original cause | errors.Handled or errors.Opaque |
%v for potentially-nil errorsPrefer %v over %s when formatting errors that might be nil. A nil error
formatted with %s renders as %!s(<nil>), while %v renders as <nil>.
How errors should affect process and session lifecycle:
User input / query errors (e.g. SELECT invalid, SELECT 1/0):
Return a regular error with an appropriate SQLSTATE code. Do not kill
the session or the process.
Unexpected condition scoped to one query (unreachable code,
precondition failure): Return errors.AssertionFailedf(...). Crash
reports are sent automatically. Do not kill the session or the process.
Candidate future feature (unsupported parameter combination, unexpected
but valid input hitting a default/else clause): Use
pgerror.UnimplementedWithIssue(...) or pgerror.Unimplemented(...).
See "Unimplemented features" below.
Invalid session-scoped state (unreachable code operating on session state): Propagate an assertion error to the client. Kill or make the session read-only. Do not kill the process.
Invalid state on a read-only or non-persisting path (unexpected disk
data, bad subsystem return): Propagate an assertion error, also call
log.Errorf. Ensure no data is persisted. Do not kill the process.
Invalid state on a data-persisting path (post-condition failure during
write, corruption in KV storage): Call log.Fatalf. This kills the
process and sends a crash report automatically.
SQLSTATE is the stable, documented error API -- not message text. Use the same code PostgreSQL would use in an equivalent situation. Only invent new codes when PostgreSQL has no equivalent; respect the two-character category prefix. Verify codes with SQL logic tests.
CockroachDB-specific codes:
See pkg/sql/pgwire/pgcode/codes.go for the full list.
Do not include arbitrarily large strings in error payloads -- they cause
excessive memory use and truncated crash reports. When in doubt, truncate
to a prefix and append a unicode ellipsis (…).
Mark unsupported-but-plausible functionality with
pgerror.UnimplementedWithIssue(issueNum, ...) or
pgerror.Unimplemented(...). Label the tracking issue with docs-todo,
known-limitation, and X-anchored-telemetry. Unimplemented errors get
their own non-crash telemetry automatically.
Merging incomplete implementations is acceptable as long as every
unimplemented code path uses unimplemented.New(...) (or the variants
above). Track each gap with a GitHub issue and a // TODO(#NNNNN)
comment at the call site.
Data in logs and errors is unsafe (PII-laden) by default and will be
redacted before being sent to Cockroach Labs. Mark information as safe to
preserve observability. Always use cockroachdb/errors (not fmt.Errorf) so
that error messages are automatically redactable.
errors.New/Newf/Wrap/log.Infof/etc.time.Time, time.Duration.error values when wrapped.| Scenario | Approach |
|---|---|
Type contains only IDs, counts, or other non-PII (e.g. NodeID) | Implement SafeValue and update the allowlist in pkg/testutils/lint/passes/redactcheck/redactcheck.go. |
| Type has a mix of safe and unsafe fields | Implement SafeFormatter (preferred). |
Prefer SafeFormatter over SafeValue -- it is more precise, does not
require linter allowlist changes, and is resistant to someone later adding
unsafe fields to the type.
String() to SafeFormat()// Before: entire output is considered unsafe.
func (m MetricSnap) String() string {
suffix := ""
if m.ConnsRefused > 0 {
suffix = fmt.Sprintf(", refused %d conns", m.ConnsRefused)
}
return fmt.Sprintf("infos %d/%d sent/received, bytes %dB/%dB sent/received%s",
m.InfosSent, m.InfosReceived,
m.BytesSent, m.BytesReceived,
suffix)
}
// After: output is safe (all fields are integers, format strings are literals).
func (m MetricSnap) SafeFormat(w redact.SafePrinter, _ rune) {
w.Printf("infos %d/%d sent/received, bytes %dB/%dB sent/received",
m.InfosSent, m.InfosReceived,
m.BytesSent, m.BytesReceived)
if m.ConnsRefused > 0 {
w.Printf(", refused %d conns", m.ConnsRefused)
}
}
func (m MetricSnap) String() string { return redact.StringWithoutMarkers(m) }
redact.Sprint() / redact.Sprintf() -- create a RedactableString from
mixed safe/unsafe values.redact.StringBuilder -- build a RedactableString programmatically.redact.RedactableString in struct fields that will appear in logs,
instead of plain string.redact.Safe() / errors.Safe() / log.Safe() -- these are
fragile promises that break when someone changes the argument's type.string to RedactableString manually -- use
redact.Sprintf instead.SafeValue on complex types -- someone may later add an unsafe
field without noticing.Use echotest to lock down SafeFormat output:
func TestMyType_SafeFormat(t *testing.T) {
v := MyType{ID: 1, Name: "sensitive"}
redacted := string(redact.Sprint(v))
echotest.Require(t, redacted, datapathutils.TestDataPath(t, t.Name()))
}
Run with -rewrite to generate the initial testdata file, then verify markers
appear around unsafe fields.
CockroachDB uses a gogoproto fork.
Proto definitions live in .proto files; generated .pb.go files are
regenerated by ./dev gen protobuf or ./dev build short.
required -- it breaks backward/forward compatibility between
nodes running different binaries.gogoproto.nullable so the compiled field type is the value itself
(not a pointer). Absence is represented as the zero value. Exceptions:
when you need to distinguish zero from absent, or when the message type
is very large and pointer copying is cheaper.optional -- use gogoproto.nullable instead if you need
presence semantics.types.Any and json types -- they are very hard to decode when
inspecting debug zips.Gotcha: Empty proto fields can have non-zero marshalling overhead on hot paths. Use the omitEmpty field to ensure truly empty encoding.
When defining result column labels for statements or virtual tables:
information_schema), e.g. start_key not startkey.variable and value match between SHOW ALL CLUSTER SETTINGS and
SHOW SESSION ALL.table_name not
"table", index_name not "index". Never name a column "user" -- use
user_name.id is fine,
loc is not).zone_id, job_id,
table_id instead of just id.table_name not table, so table_id can be added later.information_schema or existing SHOW statements when they
match the principles above.