Back to Carbon Lang

Source files

docs/design/code_and_name_organization/source_files.md

0.0.0-0.nightly.2026.05.062.5 KB
Original Source

Source files

<!-- Part of the Carbon Language project, under the Apache License v2.0 with LLVM Exceptions. See /LICENSE for license information. SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception --> <!-- toc -->

Table of contents

<!-- tocstop -->

Overview

A Carbon source file is a sequence of Unicode code points in Unicode Normalization Form C ("NFC"), and represents a portion of the complete text of a program.

Program text can come from a variety of sources, such as an interactive programming environment (a so-called "Read-Evaluate-Print-Loop" or REPL), a database, a memory buffer of an IDE, or a command-line argument.

The canonical representation for Carbon programs is in files stored as a sequence of bytes in a file system on disk. Such files have a .carbon extension.

Encoding

The on-disk representation of a Carbon source file is encoded in UTF-8. Such files may begin with an optional UTF-8 BOM, that is, the byte sequence EF<sub>16</sub>,BB<sub>16</sub>,BF<sub>16</sub>. This prefix, if present, is ignored.

No Unicode normalization is performed when reading an on-disk representation of a Carbon source file, so the byte representation is required to be normalized in Normalization Form C. The Carbon source formatting tool will convert source files to NFC as necessary.

Alternatives considered

References