Back to Carbon Lang

Whitespace

docs/design/lexical_conventions/whitespace.md

0.0.0-0.nightly.2026.05.071.5 KB
Original Source

Whitespace

<!-- Part of the Carbon Language project, under the Apache License v2.0 with LLVM Exceptions. See /LICENSE for license information. SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception --> <!-- toc -->

Table of contents

<!-- tocstop -->

Overview

The exact lexical form of Carbon whitespace has not yet been settled. However, Carbon will follow lexical conventions for whitespace based on Unicode Annex #31. TODO: Update this once the precise rules are decided; see the Unicode source files proposal.

Unicode Annex #31 suggests selecting whitespace characters based on the characters with Unicode property Pattern_White_Space, which is currently these 11 characters:

  • Horizontal whitespace:
    • U+0009 CHARACTER TABULATION (horizontal tab)
    • U+0020 SPACE
    • U+200E LEFT-TO-RIGHT MARK
    • U+200F RIGHT-TO-LEFT MARK
  • Vertical whitespace:
    • U+000A LINE FEED (traditional newline)
    • U+000B LINE TABULATION (vertical tab)
    • U+000C FORM FEED (page break)
    • U+000D CARRIAGE RETURN
    • U+0085 NEXT LINE (Unicode newline)
    • U+2028 LINE SEPARATOR
    • U+2029 PARAGRAPH SEPARATOR

The quantity and kind of whitespace separating tokens is ignored except where otherwise specified.

References