Back to Bear

Compilation entries from an intercepted invocation

docs/requirements/output-compilation-entries.md

4.1.510.0 KB
Original Source

Intent

Build systems often compile several sources in one compiler invocation, or combine compiling and linking in a single command. Tools that consume compile_commands.json (clangd, clang-tidy, and similar) expect one entry per translation unit -- each entry describing how to compile exactly one source file on its own, without the noise of linking or of siblings compiled in the same invocation.

Bear's job is to turn each intercepted compiler invocation into zero, one, or many entries, so that downstream tools see a clean, per-source compile command regardless of how the build system phrased the invocation.

The rules below describe the user-visible transformation. The JSON shape of an individual entry is covered separately by output-json-compilation-database. Per-compiler details (which flags each compiler recognizes, which extensions identify its sources, how MSVC-style flags differ from GCC/Clang-style) are defined by Bear's per-compiler interpreters.

Acceptance criteria

One entry per compilable source (separable sources)

For compilers whose sources are separable translation units -- each source compiles on its own and contributes an independent object (GCC, Clang, the Fortran and CUDA families, MSVC, and the like):

  • An invocation that names N source files produces exactly N entries
  • Recognized source extensions include .c, .cc, .cpp, .cxx, .m, .mm, .S, .s, .f, .f90, .cu, and other language- specific extensions defined by the per-compiler interpreters
  • Each entry's file field is one of those sources
  • In each entry, the other sources from the same invocation are removed from the argument list -- each entry looks like a command that compiles only that one source
  • If the same source appears more than once in the same invocation (cc -c foo.c foo.c), one entry is produced per positional occurrence; deduplication is then the responsibility of output-duplicate-detection

One entry per invocation (single-translation-unit compilers)

Some compilers treat all of an invocation's sources as a single translation unit: they parse the sources together and produce one combined output rather than one object per source. The Vala compiler is the motivating example -- it compiles every .vala/.gs source of a target together and produces one library/binding. For such a compiler:

  • An invocation that names N source files produces exactly ONE entry, regardless of N
  • That entry's file field is the first source in argument order
  • All N sources are retained in the entry's argument list (they are not stripped as siblings, because they are not separate units)
  • Link-only flag stripping (below) still applies
  • If the invocation compiles no source at all, it produces no entry, exactly as for the separable case

Because the entire invocation collapses to one entry keyed on its first source, duplicate detection (output-duplicate-detection) sees one producer per invocation rather than N near-identical producers; this is the intended behaviour for a single-translation-unit compiler and avoids emitting several entries that all claim to produce the same combined output.

Zero entries for invocations that do not compile a source

An invocation produces no entries when any of the following holds:

  • Every positional file argument is an object file, archive, or shared library (.o, .obj, .a, .so, .lib, .dylib, ...) -- this is a pure link step
  • The invocation requests information only (--version, --help, -###, -dumpversion, ...)
  • The invocation requests preprocessing only (-E) or dependency generation only (-M or -MM without a compile step)
  • The executable is not a recognized compiler, or argument parsing does not find a source; Bear emits no entry rather than a best-effort guess

-fsyntax-only, -MD, and -MMD do compile the source (the last two emit dependency files as a side effect) and therefore produce an entry.

When a single invocation both compiles and links (cc -o a.out -lsomething src.c), the resulting entry describes only the compile step. Flags whose effect is limited to the link stage are removed.

GCC/Clang-style examples: -l<name>, -L<dir>, -Wl,..., -Xlinker ..., -shared, -static, -rdynamic. MSVC counterpart: /link and every argument following it.

Preprocessing, compiling, and assembling flags (-D, -I, -isystem, -iquote, -std=..., -O2, -Wall, -c, -S, -x <lang>, and their MSVC equivalents) are kept, together with driver-level options that affect compilation.

Argument order is preserved

Within a given entry, the remaining arguments appear in the same relative order as in the original command. Downstream consumers are order-sensitive:

  • Include search paths (-I, -isystem, -iquote) are searched in the order they appear
  • Later -D definitions override earlier ones
  • -W options can enable and then disable the same warning
  • -x <lang> language overrides apply to subsequent source files and must keep their position relative to them
  • The compiler executable stays at index 0 of arguments

The output field

The per-entry output field (see output-json-compilation-database) is optional and off by default. When enabled via configuration, Bear records the value of the invocation's output flag (-o, MSVC /Fo, /Fe) and emits it in each entry produced from that invocation.

  • For a single-source invocation (cc -c src.c -o src.o), the entry's output is src.o.
  • For a multi-source invocation with a single output flag (cc -o a.out src1.c src2.c src3.c), Bear copies the output value verbatim into every entry. All three entries report output = a.out, even though a real build would produce src1.o, src2.o, and src3.o. Per-source inference of object names is a known gap; see Notes.
  • When the invocation has no output flag, the output field is absent.

Non-functional constraints

  • The same rules apply to every compiler family recognized by Bear's interpreters (GCC/Clang, MSVC, Fortran, CUDA, and others); the flag names and source extensions listed above are indicative, not exhaustive
  • Source-extension recognition follows the file system's rules: on Linux and BSD a file named foo.C is treated as a C++ source because the extension is .C, while on Windows and typical macOS configurations the same file also matches extension lists written in lowercase
  • Response files (@argfile) are observed as written by the build. By default Bear does not expand them: an entry whose original command used @args.rsp contains @args.rsp in its arguments. Inlining the contents of these files into the entry is an opt-in behaviour governed by output-response-file-inlining

Testing

Given a build that runs cc -c src1.c src2.c src3.c in one invocation:

When the user runs Bear wrapping that build, then compile_commands.json contains three entries, with file set to src1.c, src2.c, and src3.c respectively, each entry's arguments contain -c and only its own source file, and the other two source files do not appear in that entry's arguments.

Given a build that compiles multiple files via separate compiler invocations (the typical make -j case):

When the user runs Bear wrapping that build, then compile_commands.json contains one entry per source file, and each entry names the compiler used for that source.

Given a build that compiles a single source file:

When the user runs bear -- cc -c src.c, then compile_commands.json contains exactly one entry, with file set to src.c and the compiler at arguments[0].

Given a build that runs a single-translation-unit compiler over two or more sources in one invocation (for example valac src1.vala src2.vala):

When the user runs Bear wrapping that build, then compile_commands.json contains exactly one entry for that invocation, with file set to the first source (src1.vala), and all of the sources appearing in that entry's arguments.

Given a build that runs cc -o a.out src1.c src2.c src3.c:

When the user runs Bear wrapping that build, then compile_commands.json contains three entries, one per source file, each entry describes a pure compile step (no link-only flags), and no entry's file is a.out.

Given a build that runs cc -o a.out obj1.o obj2.o obj3.o:

When the user runs Bear wrapping that build, then compile_commands.json contains no entries for this invocation.

Given a build that runs cc -o a.out -lsomething -O2 src.c:

When the user runs Bear wrapping that build, then the resulting entry contains -O2, and it does not contain -lsomething.

Given a build that runs cc -I first -I second -DFOO -DBAR -c src.c:

When the user runs Bear wrapping that build, then the entry lists -I first before -I second, and -DFOO before -DBAR, matching the original order.

Given a build that runs cc --version or cc -###:

When the user runs Bear wrapping that build, then compile_commands.json contains no entries for this invocation.

Given a build that runs cc -o a.out src1.c src2.c with the output field enabled via configuration:

When the user runs Bear wrapping that build, then every entry's output is a.out (reflecting the known limitation documented above, not an ideal behaviour).

Notes

  • Per-source inference of object names (src1.o, src2.o) for multi-source invocations that share a single -o output is a plausible future improvement. It is not implemented today; the first output value is copied into every entry.
  • Related: output-json-compilation-database -- per-entry JSON shape.
  • Related: interception-compiler-env-with-flags -- environment variables that contribute flags to entries.
  • Related: output-append, output-duplicate-detection, output-source-directory-filter -- stages that run on the entries produced by this step.

Rationale