Toolchain development

Toolchain structure

Under toolchain/:
- base/: Base infrastructure and common utilities.
- check/: Semantic analysis (SemIR generation).
- lex/: Lexing (Source -> Tokens).
- lower/: Lowering to LLVM IR.
- parse/: Parsing (Token -> Parse Tree).
- sem_ir/: Semantic Intermediate Representation (SemIR) definitions.

Toolchain architecture

Documentation: Refer to toolchain/docs for detailed architecture design and patterns.
- Refer to Toolchain Idioms for a comprehensive list of patterns (for example, ValueStore, formatting .def files, struct reflection) used throughout the implementation.
Phases: Lex -> Parse -> Check -> Lower.
Definitions: Many kinds (tokens, parse nodes, SemIR instructions) are defined in .def files and expanded by way of macros.
Handlers:
- Parser: Handle<StateName> in parse/handle_*.cpp.
- Checker: HandleParseNode in check/handle_*.cpp.
- Lowering: HandleInst in lower/handle_*.cpp.
Iteration: Prefer iterative algorithms over recursive ones to prevent stack exhaustion on complex codebases.

Essential commands

Test everything: bazelisk test //...
Test specific target: bazelisk test //toolchain/testing:file_test
Test specific file: bazelisk test //toolchain/testing:file_test --test_arg=--file_tests=<path_to_carbon_file>
Build toolchain: bazelisk build //toolchain/...

Updating test data

Carbon tests often use file_test (for example, //toolchain/testing/file_test). For detailed guidelines on authoring tests, including file splits, naming conventions (fail_, todo_), and generating minimal output with SemIR dumps, please refer to the Toolchain tests skill.

If you change compiler behavior, you likely need to update expected test outputs. Do not manually edit thousands of lines of expected output. Use the script:

bash

./toolchain/autoupdate_testdata.py
# Or for a specific file:
./toolchain/autoupdate_testdata.py toolchain/check/testdata/my_test.carbon

Debugging and diagnostics

Printing to stderr: Use llvm::errs() << "debug info\n";.
- Avoid std::cout (it may interfere with tool output).
SemIR Stringification:
- SemIR objects often have a Print method or operator<<.
- inst.Print(llvm::errs())
Debugging Crashes:
- Bazel sandboxing can hide artifacts. Use --sandbox_debug if needed, but often running the binary directly from bazel-bin/ is easier for debugging.

Error handling

No exceptions: Do not use C++ exceptions.
ErrorOr<T>: Return ErrorOr<T> for fallible operations.
- Check with if (auto result = Function(); result) { Use(*result); }
llvm::Expected<T>: Similar to ErrorOr, used when interfacing with LLVM.

Casting (LLVM style)

Use llvm::cast<T>(obj) (checked, asserts on failure).
Use llvm::dyn_cast<T>(obj) (returns null on failure).
Use llvm::isa<T>(obj) (boolean check).
Avoid dynamic_cast and standard RTTI.

Data structures

Prefer APIs in common/ and toolchain/base/ over LLVM ADTs. For example, use Map instead of llvm::DenseMap.
If no Carbon API exists, prefer LLVM ADTs over standard library ones (for example llvm::SmallVector, llvm::StringRef).
StringRef is a view; be careful with lifetimes.

Common pitfalls

Legacy explorer references: The explorer prototype has been moved. Ignore references to it in proposals or old docs; focus on toolchain.
Manually updating test files: Always check if autoupdate_testdata.py can do it for you.
Using std::string unnecessarily: Prefer llvm::StringRef for arguments.
Header includes: Use specific include orders (often enforced by clang-format).
Parse node order: Semantics processes parse nodes in post-order; ensure your parser transitions support this.