styleguide/c++/checks.md
CHECK(), DCHECK() and NOTREACHED() are all used to ensure that invariants
hold. They document (and verify) programmer expectations that either some
statement always holds true at the point of (D)CHECKing or that a piece of
code is unreachable (for NOTREACHED). CHECK failures and reachable
NOTREACHEDs result in an application crash (generating a crash report).
DCHECKs are only evaluated on developer builds, test infrastructure and a
minuscule in-the-wild population that does not currently represent all
platforms. None of these should be used to validate data that is provided by
end-users or website developers. Such data is untrusted, and must be validated
by standard control flow.
An invariant that does not hold should be seen as Undefined Behavior, and
continuing past it puts the program into an unexpected state. This applies in
particular to DCHECK()s as they do not test anything in production and thus do
not stop the program from continuing with the invariant being violated. All
invariant failures should be seen as P1 bugs, regardless of their crash rate.
Continuing past an invariant failure can cause crashes and incorrect behaviour
for our users, but also frequently presents security vulnerabilities as
attackers may leverage the unexpected state to take control of the program. In
the future we may let the compiler assume and optimize around DCHECK()s
holding true in non-DCHECK builds using __builtin_assume(), which further
formalizes undefined behavior.
Failure can come from beyond Chromium's ability to control. These failures
should not be caught with invariants, but handled as part of regular control
flow. In the rare case where it's impossible to safely recover from failure use
base::ImmediateCrash() to terminate the process instead of using CHECK()
etc. Doing so avoids implying that the generated crash reports should be triaged
as bugs in Chromium. Fatally aborting is a last-resort measure.
We must be resilient to a bad prior release of Chromium which may have persisted bad data to disk or a bad server-side rollout which may be sending us incorrect or unexpected configs.
Note that wherever CHECK() is inappropriate, DCHECK() is inappropriate as
well. DCHECK() should still only be used for invariants. Ideally we'd have
better test coverage for failures created from outside Chromium's control.
Non-exhaustive list of failures beyond Chromium's control:
base::ImmediateCrash().CHECK() for bad syntax, etc.CHECK()s as a bad server-side rollout
shouldn't be able to bring down the client. Note that you may CHECK() that
data is valid after the caller should've validated it.base::ImmediateCrash() or CHECK() etc.
as part of normal control flow.CHECK() its result as we expect the
rug to be pulled from under our feet. In the case of hardware failure we
should not for instance CHECK() that a write succeeded.In some cases (malware, ..., dll injection, etc.) third-party code outside of our control can cause us to trigger various CHECK() failures, due to injecting code or unexpected state into Chrome. These can create "weird machines" that are useful to attackers. We don't remove CHECKs just to support them, though we may handle these unexpected states if possible and necessary. Chromium is not designed to run against arbitrary code modification.
Prefer CHECK() and NOTREACHED() over DCHECK()s as they ensure that if an
invariant fails, the program does not continue in an unexpected state, and we
hear about the failure either through a test failure or a crash report. This
helps prevent user harm such as security bugs when our software does what we did
not expect. Historically, CHECK() was seen as expensive but great effort and
care has gone into making the crash instructions nearly free on modern CPUs. Log
messages are discarded from CHECK()s in production builds but provide
additional information in debug and DCHECK builds.
DCHECK() (and DCHECK_EQ(), DCHECK_LT(), etc) provide a fallback mechanism
to check for invariants where the test being performed is too expensive (either
in terms of generated code size or performance) to verify in production builds.
The risk of depending on DCHECK() is that, since it disappears in production
builds, it's only verified in tests, on developer machines and a very small
subset of Canary builds. Any side effects intended to happen inside the
DCHECK() disappear from production along with it, and unexpected behaviour can
happen afterward as a result.
NOTREACHED() signals that a piece of code is intended to be unreachable while
also terminating if it is in fact reached, as if a CHECK() failure.
Below are some examples to explore the choice of CHECK() and its variants:
// Good:
//
// Testing pointer equality is very cheap so write this as a CHECK. A security
// bug would happen afterward if the CHECK fails (in this case, on the next
// line).
auto it = container.find(key);
CHECK(it != container.end());
*it = Foo();
// Good:
//
// This is an expensive operation. Consider writing a test to provide coverage
// for this as well. DCHECK() is available as a fallback to verify the condition
// in tests and on a small subset of Canary builds.
DCHECK(|invoke an O(n^2) operation|);
// Good:
//
// This switch handles all cases, but enums can technically hold any integer
// value (even if all enum members are enumerated), so the compiler must try to
// handle other cases too. We can avoid dealing with values outside enums by
// using NOTREACHED() while also making sure we hear about it.
switch (my_enum) {
case A: return 1;
case B: return 5;
case C: return 3;
}
NOTREACHED();
// Bad:
//
// Do not handle `DCHECK()` failures. Use `CHECK()` instead and omit the
// conditional below.
DCHECK(foo); // Use CHECK() instead and omit conditional below.
if (!foo) {
...
}
// Bad:
//
// Use CHECK(bar); instead.
if (!bar) {
NOTREACHED();
}
If you're not confident that an unexpected situation can't happen in practice,
an additional base::NotFatalUntil::M120 argument after the condition may be
used to gather non-fatal diagnostics before turning fatal in a future milestone.
CHECK() and NOTREACHED() with a base::NotFatalUntil argument will provide
non-fatal crash reports before the fatal milestone is hit. They preserve and
upload logged arguments which is useful for debugging failures during rollout as
well.
Since these variants are non-fatal and do not terminate make best-effort
attempts to handle the situation, like an early return and try to reason about
that being at least "probably safe" in calling contexts. Do not use
base::NotFatalUntil if there's no reasonable way to recover from the invariant
failure (i.e. if this is wrong we're about to crash or hit a memory bug).
Any invariant failures should be resolved before turning fatal even if they only fail for a very low number of users (above the noise floor). Once fatal they will be invariants that we collectively trust to hold true (other code may be rewritten with these assumptions).
Using non-fatal invariant validation is especially appropriate when there's low pre-stable coverage. Specifically consider using these:
DCHECK()s.As base::NotFatalUntil automatically turns fatal, keep an extra eye on
automatically-filed bugs for failures in the wild. Discovered failures, like
other invariant failures, are high-priority issues. Once resolved, either by
handling the unexpected situation or making sure it no longer happens, the
milestone number should be bumped to allow for validation in stable channels
before turning fatal.
Failing instances should not downgrade to DCHECKs as that hides the ongoing
problem rather than resolving it. In rare exceptions you could use
DUMP_WILL_BE_CHECK() macros for similar semantics (report on failure) without
timeline expectations, though in this case you must also handle failure as best
you can as failures are known to happen.
For non-invariant situations we'd like to be notified about, such as an OS API
returning undocumented or unexpected values, we'd like to collect enough
information to diagnose what's going on. Here non-fatal crash reporting can be
done with base::debug::DumpWithoutCrashing(). Using crash keys is helpful for
gathering enough information to take action. When doing so, provide enough
context (such as a link to a bug) to explain why the information is being
collected and actions to take when it fires.
Note that this should only be used in cases where crash dumping yields something
actionable and should not be kept dumping indefinitely. Crash dumping causes
jank and is rate limited which hides (throttles) other crash reporting. As a
DumpWithoutCrashing() starts firing, it should be made to stop firing. Either
remove it if this was part of a one-off investigation (and we have enough data)
or update the code to make sure it no longer generates reports (for instance,
handle a new OS API result). In either case consider merging to release branches
to avoid generating a large number of crash reports.
As an illustrative example here's a snippet that notifies us of unexpected OS
API results and the last reported error from WaitableEvent on Windows. When this
hits we want to update surrounding code to handle the new return code or prevent
it from happening. If we're generating a concerning number of crash reports we
should also decide whether to merge a fix to release branches or remove
base::debug::DumpWithoutCrashing(); on branch to prevent excessive flooding.
NOINLINE void ReportInvalidWaitableEventResult(DWORD result) {
SCOPED_CRASH_KEY_NUMBER("WaitableEvent", "result", result);
SCOPED_CRASH_KEY_NUMBER("WaitableEvent", "last_error", ::GetLastError());
base::debug::DumpWithoutCrashing(); // https://crbug.com/1478972.
}
For failures in tests, GoogleTest macros such as EXPECT_*, ASSERT_* or
ADD_FAILURE() are more appropriate than CHECKing. For production code:
LOG(DFATAL) is fatal on bots running tests but only logs an error in
production.DLOG(FATAL) is fatal on bots running tests and does nothing in production.As these only cause tests to fail, they should be rarely used, and mostly exist
for pre-existing code. Prefer to write a test that covers these scenarios and
verify the code handles it, or use a fatal CHECK() to actually prevent the
case from happening.