proposals/p007441-trailing-comments.md
Carbon currently requires a comment to be the only non-whitespace on its line. A
// comment that follows other content on a line, called a trailing comment,
is a lexer error. This proposal removes that restriction, allowing a comment to
follow other content on a line. Everything else about comments is unchanged: a
comment still begins with //, still requires whitespace after the //, and
still runs to the end of the line. Carbon continues to provide only line
comments; no block or intra-line comments are added.
Three observations motivate the change. First, trailing comments are well suited to short annotations attached to a specific entity or value on a line. Second, the lexer design now makes it trivial to lex trailing comments, and in fact requires extra logic and potentially cost to reject them. Third, C++ code routinely uses trailing comments, so allowing them lets Carbon carry the layout of migrated code over directly, rather than reworking each comment to read well in a different structure.
The original comments proposal provided a
single kind of comment, // to end of line, and restricted it so that "no code
is permitted prior to a comment on the same line." That proposal explicitly
labeled the restriction experimental and said the decision "should be
revisited if we find there is a need for such comments in the context of the
complete language design."
Several considerations now motivate revisiting it:
The comments proposal made several decisions that this proposal leaves untouched:
// and running to the end of
the line; there is no physical line continuation.// must be followed by whitespace (or end of file). Sequences where
// is not followed by whitespace are
reserved for future extension
such as documentation or fold markers.The one decision this proposal changes is the
experimental requirement
that a comment be the only non-whitespace on its line. The original proposal's
rationale for that requirement was primarily about tools: an intra-line or
trailing comment makes it harder for a formatter to know what program element
the comment "attaches to," and reflowing aligned trailing comments across edits
is genuinely complex. It grouped trailing comments together with intra-line
(C-style /* ... */) comments and declined both, pending experience.
This proposal separates the two. It allows trailing line comments, which run to the end of the line and so have an unambiguous extent, while continuing to exclude intra-line and block comments, which are where most of the original tooling complexity lives.
Carbon's lexer is a table-dispatch loop: the first byte of each lexical element
selects a handler, and / is wired to a handler that decides between a /
operator and a // comment with a max-munch rule. That dispatch does not depend
on where in the line the / occurs. A // reaching the comment handler in
the middle of a line takes the same path as one at the start of a line.
A comment, once recognized, is consumed by advancing to the end of the line,
which the lexer already tracks. The only work the lexer does today that is
specific to trailing comments is the explicit check that the // begins at the
line's first non-whitespace position, which exists solely to reject trailing
comments. Removing that rejection and recording the comment instead is less
code, not more, and adds no new scanning. In this sense, supporting trailing
comments is nearly free in the lexer, while continuing to reject them is what
now costs extra logic in the comment lexer's hot path.
Allow a // comment to follow other content on a line. Such a comment is called
a trailing comment. It is lexically identical to a full-line comment: it
begins with //, requires whitespace after the //, and runs to the end of the
line. The sole change is that a comment is no longer required to be the only
non-whitespace on its line.
Carbon still provides only line comments. This proposal does not add intra-line
or block comments, and does not change the reserved status of // sequences
that are not followed by whitespace.
A comment is a lexical element beginning with // and running to the end of the
line. A comment may appear either as the entire content of a line (after
optional leading whitespace) or following other content on the line. In both
cases the character after // must be whitespace (newline and end of file count
as whitespace), and the comment is removed prior to formation of tokens; it
produces no token.
The //@... tooling directives (such as //@dump-sem-ir-begin) remain
recognized only as full-line comments, since they are line-oriented markers; in
trailing position //@ is simply a // not followed by whitespace, and so is a
reserved (currently invalid) comment as before.
A block string literal's introducer line, consisting of the ''' and an
optional file type indicator, may also carry a trailing comment. This is an
ordinary trailing comment, and it behaves as a comment everywhere: syntax
highlighting renders it as a comment, carbon format preserves it, tooling
that inspects comments sees it, and // not followed by whitespace remains
reserved on this line just as elsewhere, so it is not valid within a file type
indicator. The one special rule is needed because the file type indicator
would otherwise run to the end of the line: a trailing comment ends the file
type indicator, just as trailing whitespace does. Because the comment is not
part of the indicator, it may contain ', #, and " even though the
indicator may not.
'''Allowing a comment on the introducer line, and allowing it to contain ',
creates a lexical ambiguity. Consider:
'''foo // '
This could be a block string literal introducer with the file type foo and a
comment, or an empty character literal '' followed by the character literal
'foo // '.
This proposal resolves the ambiguity by specifying that a
character literal is
never empty: '' never begins a character literal, and so ''' unambiguously
begins (or ends) a block string literal. An empty character literal has no
meaning or use case, and the toolchain already rejects '' as an error, so
this only codifies the existing behavior as the disambiguation rule; it does
not change which programs are valid. It also lets a lexer classify ''
without further lookahead.
With this rule the file type indicator no longer needs a restricted character
set to be unambiguous, but this proposal keeps the restriction because it
improves error recovery: an introducer line whose file type indicator would
contain ', #, or " is diagnosed as an error and does not open a block
string literal, so text like let s: String = '''single-line?'''; produces a
contained error rather than treating the remainder of the file as string
content or as a sequence of character literals.
A trailing comment may follow any content, and the following code is now valid rather than an error:
var count: i32 = 0; // a) A local variable,
fn Render(frame: Frame) {
Draw(frame); // b) A function call,
Flush();
} // c) And a closing brace.
Full-line and trailing comments coexist; a full-line comment still introduces the code below it, while a trailing comment annotates the content on its own line:
// Compute the smallest prime factor.
var factor: i32 = SmallestFactor(n); // TODO: i32 -> i64
The reserved-comment rule is unchanged, so a trailing // that is not followed
by whitespace is still an error:
var x: i32 = 0; //rejected: whitespace is required after `//`
A trailing comment may also follow the file type indicator on a block string literal's introducer line:
var query: String = '''sql // TODO: switch to a prepared statement
SELECT * FROM t
''';
Trailing comments are an addition to full-line comments, not a replacement, and the two suit different purposes:
The lexer records each trailing comment as a comment, marked as trailing so that tools can distinguish it from a comment that introduces the following code. A trailing comment is never coalesced with the full-line comments adjacent to it, even when they line up, because it belongs to the content on its own line.
One exception in the toolchain today: a trailing comment on a block string
literal's introducer line is carried within the string literal token's spelling
rather than in the lexer's comment records, and carbon format preserves it
verbatim as part of the literal. Surfacing it through the comment records, and
diagnosing a reserved // sequence within a file type indicator, remain as
toolchain follow-ups.
carbon format keeps a trailing comment on the line it annotates, separated
from the preceding content by a single space, rather than relocating it to its
own line. This is the behavior an author intends, and it lets code migrated from
C++ keep the comment layout it already had.
Reflowing trailing comments under more aggressive reformatting, for example maintaining a column of aligned trailing comments or moving an over-long trailing comment to its own line, is a formatting-quality concern rather than a language one, and can be improved over time without further language changes.
// to match C++.We could leave the experimental restriction in place and continue to require every comment to be the only non-whitespace on its line.
We could go further and also allow comments that attach to something smaller
than a line, such as C-style intra-line comments like f(/*size=*/5) or block
comments that span or interrupt lines, as the
original proposal discussed.
/*...*/
attaches to and how to wrap it, and /*...*/ with Carbon-specific semantics
risks confusing C++ readers, as the original proposal noted. Carbon also
intends to address inline argument annotation through the grammar (for
example, named arguments) so that such utterances are meaningful to tools
rather than being text.// to end of line) and the comment's extent unambiguous. The
additional intra-line and block forms carry the costs the original proposal
identified without a corresponding need, so they remain out of scope and the
syntactic-disambiguation use case continues to be a matter for the grammar.We could allow trailing comments only after particular constructs, for example
only after a ;, an enumerator, or a struct field, to bound the formatter's
problem to a small set of well-understood layouts.
We could permit comments that attach intra-line but require a marker indicating
direction, such as the //> / //< forms
sketched in the original proposal,
so a tool can tell what the comment attaches to.
We could decline trailing comments and instead require every per-line annotation to be promoted into the language, for example as a named argument or a future attribute or documentation syntax, so the information is structured and visible to tools.