requirements/output-json-compilation-database.md
When the user runs Bear wrapping a build command, Bear produces a JSON file
(compile_commands.json by default) that lists every compilation command
invoked during the build. Each entry contains the working directory, the
source file, and the compilation command or arguments.
Bear's output conforms to the Clang JSON Compilation Database specification: https://clang.llvm.org/docs/JSONCompilationDatabase.html
A compilation database is a JSON file consisting of an array of "command objects", where each command object specifies one way a translation unit is compiled in the project.
[
{ "directory": "/home/user/llvm/build",
"arguments": ["/usr/bin/clang++", "-Irelative",
"-DSOMEDEF=With spaces, quotes and \\-es.",
"-c", "-o", "file.o", "file.cc"],
"file": "file.cc" },
{ "directory": "/home/user/llvm/build",
"command": "/usr/bin/clang++ -Irelative \"-DSOMEDEF=With spaces, quotes and \\-es.\" -c -o file.o file.cc",
"file": "file2.cc" }
]
directory -- The working directory of the compilation. All paths
specified in the command or file fields must be either absolute or
relative to this directory.
file -- The main translation unit source processed by this compilation
step. This is used by tools as the key into the compilation database. There
can be multiple command objects for the same file, for example if the same
source file is compiled with different configurations.
arguments -- The compile command argv as a list of strings. This should
run the compilation step for the translation unit file. arguments[0]
should be the executable name, such as clang++. Arguments should not be
escaped, but ready to pass to execvp().
command -- The compile command as a single shell-escaped string.
Arguments may be shell quoted and escaped following platform conventions,
with " and \ being the only special characters. Shell expansion is not
supported.
Either arguments or command is required. arguments is preferred, as
shell (un)escaping is a possible source of errors.
output -- The name of the output created by this compilation step. This
field is optional. It can be used to distinguish different processing modes
of the same input file.
command field in detailWhen Bear emits the command field (instead of arguments), it joins the
argument list into a single string using shell_words::join. The resulting
string is then embedded in JSON.
The shell_words crate follows POSIX shell quoting conventions and may
produce either single-quoted or double-quoted output depending on the
argument content. Both forms are valid per the specification.
This means the content has two layers of escaping:
" becomes \" and \ becomes \\ at the JSON level.For example, compiling with -DNAME=\"hello\":
arguments form: [..., "-DNAME=\"hello\"", ...] (no shell escaping,
only JSON encoding of the raw argument)command form: "... '-DNAME=\"hello\"' ..." or
"... \"-DNAME=\\\"hello\\\"\" ..." (shell-quoted, then JSON-encoded)Consumers that read the command field must first JSON-decode the string,
then apply shell unquoting to recover the original argv. This double
encoding has historically been a source of bugs (see GitHub issues #14, #70,
#77, #81, #88, #96, #508).
arguments[0])The specification states that arguments[0] should be the executable name
(e.g. clang++), but does not prescribe whether it must be an absolute
path, a relative path, or a bare command name. Bear preserves the compiler
path as it was observed during interception -- if the build invoked gcc,
Bear writes gcc; if it invoked /usr/bin/gcc, Bear writes /usr/bin/gcc.
This behavior differs from Bear v3.x, which resolved compiler paths to absolute. The current behavior is configurable but the specification is intentionally silent on this point.
Related issues: #240, #678, #679, #671.
directory, file, and at least one of command or argumentscommand and arguments fields are mutually exclusive in each entrycommand field that cannot be parsed by POSIX shell-word splitting is
rejected during validationfile or directory fields are rejected during validationWARN level with the reason, and counted in the pipeline summary; it
never aborts processing of subsequent entriesERROR-level summary line so
the empty compilation database is never silent--version or --help) are excluded--output flagarguments (array form)command format is selected, arguments are shell-escaped using
shell_words::joinoutput field is omitted by default and included when
format.entries.include_output_field is enabledBear defaults to the arguments array format because the specification
recommends it and because shell (un)escaping is a known source of errors.
The command string format is available for consumers that require it.
The format selection is controlled via configuration:
format:
entries:
use_array_format: true # true = arguments, false = command
include_output_field: false # include the output field
Entry validation runs as a distinct stage in the output pipeline, immediately before JSON serialization. When an entry fails validation:
compile_commands.json.WARN-level log line names the file, directory, and the specific
validation reason (e.g. empty directory, unparsable command).entries_dropped_invalid counter in the pipeline summary is
incremented.This contract ensures a single malformed entry cannot destroy the usable output produced from the rest of a build. It also replaces the prior fail-fast behavior, which aborted the whole pipeline on the first invalid entry and produced no database at all -- a failure mode that both lost information and yielded unclear error signals (see issue #692).
When entries_dropped_invalid > 0 && entries_written == 0, Bear emits a
single ERROR-level summary line stating that every entry was dropped.
The compilation database is still written (as an empty array) so downstream
tooling sees a valid file, but the log makes the empty result impossible to
miss.
The pipeline exit code is not affected by validation drops alone. Exit codes reflect the build command's own status and genuine I/O failures, not data-quality issues with individual entries.
Given a project with a single C source file:
When the user runs
bear -- <compiler> -c source.cthencompile_commands.jsonis created, it contains valid JSON with exactly one entry, the entry hasdirectoryequal to the working directory,fileequal to "source.c", andargumentsstarting with the compiler path.
Given a project with multiple C and C++ source files:
When the user runs
bear -- sh build.shwhere build.sh compiles all files, thencompile_commands.jsoncontains one entry per source file, and each entry has the correct compiler (C or C++) inarguments[0]. Note: exact entry count may vary when a caching compiler wrapper (ccache) is in the path.
Given a build command that produces no compiler invocations:
When the user runs
bear -- true, thencompile_commands.jsoncontains an empty JSON array[].
Given a build that partially fails (some files compile, some do not):
When the user runs
bear -- sh build.sh, thencompile_commands.jsonstill contains entries for all attempted compilations, and Bear's exit code reflects the build failure.
Given a compiler invocation with -DNAME=\"hello\":
When Bear writes the
commandfield, the value is shell-escaped (the crate may use single or double quotes), the JSON encoding adds another layer, and JSON-decoding followed by shell-word splitting recovers the original argv.
Given a compiler invoked as a bare name (e.g. gcc):
When Bear writes the entry, then
arguments[0]isgcc(not resolved to an absolute path).
Given a compiler invoked with a full path (e.g. /usr/bin/gcc):
When Bear writes the entry, then
arguments[0]is/usr/bin/gcc.
Given a build that produces a mix of valid entries and one entry that
fails validation (for example, an empty directory field):
When Bear writes the output, then the valid entries appear in
compile_commands.jsonunchanged, aWARNlog line names the dropped entry and the validation reason, the pipeline summary reportsentries_dropped_invalid = 1, and the process exit code is not affected by the drop.
Given a build where every candidate entry fails validation:
When Bear writes the output, then
compile_commands.jsonis written as an empty JSON array[], aWARNlog line is emitted for each dropped entry, and a singleERROR-level summary line reports that every entry was dropped.
output-duplicate-detection).file and directory fields is configurable;
see output-path-format for details.directory fields. Under
the old fail-fast behavior, the first such entry aborted the pipeline
and no database was written for the rest of the build. The new contract
keeps the rest of the output usable while making the failure visible in
the logs.