baml_language/ARCHITECTURE.md
Audience: All engineers on the BAML team and coding agents operating in the
baml_language/workspace.Purpose: This document explains the design, pipeline stages, invariants, and decision framework of the compiler2 system. It is the authoritative reference for understanding where new features should be implemented, what each layer is responsible for, and why specific boundaries exist.
The compiler2 pipeline processes BAML source through a series of representations. There are exactly three transformations that produce new data structures, and several query layers that answer questions about those structures without transforming them:
Source Text
|
v
[Lexer] ──> Tokens
|
v
[Parser] ──> CST (Concrete Syntax Tree)
|
| ← transformation: CST → AST
v
AST (Abstract Syntax Tree)
|
| ← query layer (no transformation)
v
HIR (names, scopes)
|
| ← expansion: synthesizes stream types, feeds back into HIR
v
PPIR (stream type expansion)
|
| ← query layer (no transformation)
v
TIR (types)
|
| ← transformation: AST → MIR
v
MIR (control flow graph)
|
| ← transformation: MIR → bytecode
v
Emit (bytecode for BexVM)
Critical distinction: The stages above the AST (Parser, CST→AST lowering) are about producing the AST. The stages below it (HIR, PPIR, TIR) are about answering questions about the AST. They do not produce new syntax trees. The MIR is the second transformation — it converts human-friendly BAML into a machine-friendly control flow graph. The Emit stage is the third transformation — it compiles MIR to bytecode.
This is fundamentally different from the compiler1 architecture, which was a strict linear pipeline where each layer copied and enriched the previous layer's data. In compiler2, each layer is a query on top of the AST (at least until MIR), which gives us Salsa-powered incremental compilation for free.
When deciding where to implement a feature, always ask: what is the earliest layer at which I can do this?
| Situation | Rule |
|---|---|
| Adding a feature | Put it in the highest (earliest) layer possible. Most features belong in the AST layer. |
| Changing AST | Relatively forgiving — this is where most work happens. |
| Changing HIR | Discuss with at least one person who works in HIR. |
| Changing TIR | Discuss with at least one person who works in TIR. |
| Changing MIR or Emit | Discuss with at least two people. You are almost certainly making a mistake unless you have a very specific reason. |
| Adding a new layer | Requires explicit approval from the tech lead and a senior contributor. No new layers without significant deliberation. |
The lower you go, the more scrutiny is required. Changes to downstream layers cascade into every code path on the team's surface area. Keeping boundaries clean means fewer bugs and fewer accidental coupling problems.
Crate: baml_compiler_lexer, baml_compiler_parser, baml_compiler_syntax
Responsibility: Grammar only. The parser answers the question "is this syntactically valid BAML?" It knows about keywords, punctuation, delimiters, and the structural grammar of the language. It makes no semantic decisions.
What lives here:
What does NOT live here:
The parser produces a CST (Concrete Syntax Tree), which is a lossless, error-tolerant representation of the source text. It uses a green/red tree architecture (similar to rust-analyzer's rowan).
Key Salsa query: syntax_tree(db, file) -> CST
Crate: baml_compiler2_ast
Responsibility: Desugaring. The AST takes the CST and produces a well-formed, semantically-oriented syntax tree. This is where most features live.
What lives here:
render_prompt, build_request, parse).client<llm> blocks are desugared into a top-level Let binding (the Client object) plus an optional $new companion function (the PrimitiveClient constructor).TypeExpr nodes.What does NOT live here:
Key design principle: One CST node can produce multiple AST nodes. For example, a single client<llm> MyClient { ... } definition produces two AST items: a Let and a Function. Conversely, some CST constructs collapse or transform substantially. The AST is the final syntactic form of the program.
The AST is a pure structural lowering. It uses no Salsa queries. It does not validate names or detect duplicates. It simply converts CST shapes into AST shapes.
Crate: baml_compiler2_hir
Responsibility: Names and scopes. The HIR's sole job is to answer the question "what are the names of things, and where are they declared?"
What lives here:
What does NOT live here:
Key design decision — Lambda captures: Captures are determined in HIR because you only need name information, not type information, to decide what is captured. The HIR records that a lambda captures variable a from the enclosing scope. It does not determine whether a is a direct capture or a transitive capture — that distinction only matters for the MIR (which builds the control flow graph and needs to understand transitive dependencies). The concept of a "cell" (an indirection pointer for mutable captured variables) also does NOT belong in HIR or TIR. From the TIR's perspective, a captured int is still an int — the indirection is purely a MIR/VM implementation detail.
Key Salsa queries:
file_semantic_index(db, file) — Per-file scope tree with all bindingsnamespace_items(db, namespace_id) — Items contributed to a namespacepackage_items(db, package_id) — Package-level symbol table (merges all namespaces)Crate: baml_compiler2_ppir
Responsibility: Stream type generation. This layer exists because streaming types require type-aware code generation that cannot be done in the AST layer but must happen before the TIR.
Why this must be its own layer:
What lives here:
*$stream variants for classes and type aliasesstream_expand, expand_partial)How it works: The PPIR generates synthetic AST items (the stream variants) and feeds them back into the HIR. This means the flow is actually: HIR → PPIR → back to HIR (with expanded symbols) → TIR. The TIR then consumes the enriched HIR index that includes both original and stream-expanded items.
Key Salsa query: ppir_expansion_items(db, file) — Synthetic stream items per file
Crate: baml_compiler2_tir
Responsibility: Types only. The TIR answers the question "what is the type of this expression?" and validates type correctness.
What lives here:
match, instanceof), the type is narrowed.What does NOT live here:
How type resolution works for local variables: When you ask "what is the type of x on line 10?":
Key Salsa queries:
infer_scope_types(db, scope_id) — Per-scope type inference. This is the main query. It returns types for a single scope, NOT a monolithic per-function result. This gives fine-grained incrementality: editing a lambda body only recomputes that lambda's types, not the enclosing function's.resolve_name_at(db, file, offset, name) — On-demand name resolution with type information.Crate: baml_compiler2_mir
Responsibility: Control flow graph construction. The MIR is the first layer that performs a full walk of the AST and produces a fundamentally different representation. It converts human-friendly BAML code into machine-friendly control flow graphs.
What lives here:
for, iterator for, while) become a single loop construct in MIR. The three source-level forms exist in the AST only for diagnostic quality (see Loop Desugaring and Diagnostic Preservation).a because the inner lambda captures it?).anonymous_function_0, anonymous_function_0_1 for a lambda inside a lambda).Data structures:
MirFunctionBody — Basic blocks, entry block, local declarations, unwind handlers.BasicBlock — A sequence of statements plus a terminator.MirFunctionKind::Bytecode(body) — Functions with BAML code.MirFunctionKind::Builtin(kind) — Rust-bound builtins (SysOp for I/O, NativeUnresolved for VM intrinsics).Readability: The MIR pretty-printer (pretty.rs) has been carefully designed to be readable for debugging. If you add a feature that touches MIR, you are responsible for maintaining the same level of readability. This is critical because MIR is the most bug-prone layer due to the complexity of the CFG transformation.
Key Salsa queries:
lower_function(db, ...) — Lower a function to MIR.lower_let_body(db, ...) — Lower a let binding's initializer to MIR.Crate: baml_compiler2_emit
Responsibility: Compiles MIR to bytecode for the BexVM using stackification.
You should almost never need to touch this layer. Changes here should be very small and very well justified. The emit layer is straightforward in concept — it walks the MIR CFG and emits VM instructions — and bugs here are relatively rare compared to the MIR layer.
What lives here:
OptLevel)The compiler2 uses the Salsa incremental computation framework. The key idea is that each layer is defined as a set of tracked queries that depend on other queries. When a source file changes, Salsa automatically recomputes only the queries whose inputs changed.
Database hierarchy (each layer extends the previous):
salsa::Database
└─ baml_workspace::Db (project root, file list)
└─ baml_compiler_parser::Db (syntax_tree query)
└─ baml_compiler2_hir::Db (file_semantic_index, namespace_items, package_items)
└─ baml_compiler2_ppir::Db (ppir_expansion_items, canonical queries)
└─ baml_compiler2_tir::Db (infer_scope_types, resolve_name_at)
└─ baml_compiler2_mir::Db
└─ baml_compiler2_emit::Db
The design goal: before the AST, produce the AST. After the AST, answer questions about the AST. The only layers that do production (create new data structures) are:
Everything else is a query.
Packages are resolved in topological order based on their dependency graph. The resolution order is inferred from declared dependencies, not hardcoded.
baml (standard library) ← resolved first, no dependencies
|
v
testing, insert, etc. ← depend on baml, resolved next
|
v
user ← depends on baml (and possibly others), resolved last
Packages must form an acyclic DAG. Recursive package dependencies are not allowed.
Why this matters for incremental compilation: The standard library, testing, and other non-user packages are resolved once and cached. Only the user's package changes during editing, so only it needs to be recomputed in the editor.
The PackageResolutionContext is the single point of entry for all name resolution from the TIR. It handles three cases:
| Syntax | Resolution Strategy |
|---|---|
root.SomeName | Look in the current package's root namespace |
SomeName (unqualified) | Look in the current local scope, then walk up scopes |
some_package.SomeName | Look in the external package's interface |
Important invariant: If you find code that accesses type system information outside of the package resolution context, that is a bug. Fix it and route through the resolution context to maintain a single point of entry.
Every package exposes a PackageInterface — a fully resolved type interface that lists every name, every type, and full structural information. This is what other packages consume when they depend on you.
Scopes are constructed at HIR time (not AST time) because you cannot determine scope boundaries without name resolution. Consider: Foo.Bar.baz — is Foo a namespace, a class, or a variable? You need name resolution to answer that, so scopes and name resolution are co-determined in the HIR.
Project (root)
└─ Package
└─ Namespace (can be nested, can span multiple files)
└─ File
└─ Top-level items: Function, Class, Enum, TypeAlias, Item (client/test/etc.), Let
└─ Block (curly-brace blocks with let bindings)
└─ Lambda
└─ MatchArm (pattern bindings visible to arm body and guard)
└─ CatchClause → CatchArm
When resolving a name, the system walks up the scope tree:
baml builtin package.Shadowing rules are scope-kind-dependent. For example, a match arm can shadow a function parameter, but two parameters in the same function cannot shadow each other. The HIR decides where shadowing is allowed.
ScopeId<'db> is a Salsa tracked struct pairing a SourceFile with a FileScopeId. It is the key for per-scope queries like infer_scope_types(db, scope_id). Scopes are allocated in DFS pre-order within each file.
When the AST layer encounters an LLM function, it expands it into the original function plus up to three companion functions:
| Companion | Name Pattern | Parameters | Return Type | Purpose |
|---|---|---|---|---|
render_prompt | FuncName$render_prompt | Same as parent | baml.llm.PromptAst | Renders the prompt AST |
build_request | FuncName$build_request | Same as parent | baml.http.Request | Builds the HTTP request |
parse | FuncName$parse | json: string | Same as parent | Parses the JSON response |
Implementation (baml_compiler2_ast/src/companions.rs):
Companion expanders are pure functions of type fn(&FunctionDef) -> Option<FunctionDef>, stored in a const array COMPANIONS. Each expander inspects the function's declarative_meta — if it's an LLM function, it produces a companion; otherwise, it returns None.
Companion functions are complete, self-contained AST items. They flow through HIR → TIR → MIR → emit with zero special-casing. Downstream layers have no idea they were generated.
Implication for duplicate name detection: If you have two LLM functions Foo and Foo (a duplicate), each produces four AST items (itself + three companions). All eight items will trigger duplicate-name errors in the HIR. To prevent cascading duplicate errors, the HIR must be aware that companion-derived errors should not produce additional diagnostics beyond the root duplicate.
A client<llm> block desugars into two AST items:
A top-level Let binding — Creates a Client object (defined in baml_std) with:
name: string — the client's nameclient_type: ClientType — Primitive, Fallback, or RoundRobinsub_clients: Client[] — for composite clients, references to sub-clients (as Expr::Path references enabling TIR name validation and topological dependency ordering)retry: RetryPolicy? — optional retry policy (also an Expr::Path reference)counter: int — for round-robin clients, the starting indexAn optional $new companion function (primitive clients only) — A function ClientName$new that constructs a PrimitiveClient from the provider and options. This function is called at runtime to create the actual LLM-capable client.
There is no Client type in the AST or compiler type system. The Client and PrimitiveClient types are regular structs defined in the BAML standard library (baml_std/baml/ns_llm/llm_types.baml). The compiler synthesizes constructor expressions that instantiate these standard library types. This means client type-checking happens for free through the normal TIR — no special type-checking code is needed for clients.
How Client resolves to PrimitiveClient at runtime:
Client object has a get_constructor() method that returns a Rust function pointer.PrimitiveClient.PrimitiveClient is the actual object that can render prompts, build requests, and parse responses.PrimitiveClient is constructed every time an LLM function is called (no caching currently — this is a known optimization opportunity).What about expressions in client definitions? Because clients desugar to regular AST expressions, users can use arbitrary expressions in client option values. For example, a variable reference as the model name works automatically. The config block syntax uses colon-delimited key-value pairs which are parsed as a special form in the CST and lowered to expressions in the AST.
Lambda bodies are extracted into their own scope-addressable AST units during CST→AST lowering. This is necessary because:
BAML has a special challenge: names can be referenced across files. This means global variables (like clients) have cross-file dependencies that must be resolved in a specific order.
Let bindings are collected.baml before user).Let bindings are topologically sorted by their dependency edges (derived from Expr::Path references in their initializers). If a cyclic dependency is detected, the compiler emits an error.$init function is compiled that evaluates the Let bindings in topological order, storing each result in a global slot.package_init_order list and calls each package's $init function in order during startup.This is exactly how Go handles global variable initialization: topological sort across the dependency graph, then evaluate in order.
Important: Top-level let is not available in user-facing syntax (the lexer disallows it). It exists only in the AST layer for compiler-generated constructs like client desugaring.
When you write let x = 42, you don't want x to have type literal 42 — you want it to have type int. This is handled through freshness and widening, a concept borrowed from TypeScript:
literal 42 → int, literal "hello" → string.let x: 42 = 42), the literal is already bound to a regular literal type and does not widen.The TIR uses three distinct "failure" types internally:
| Type | Meaning |
|---|---|
BuiltinUnknown | A type that genuinely represents "unknown" in user code (e.g., a function parameter typed as unknown). |
Missing | The type checker could not determine the type — this represents a typing hole and is almost certainly a bug if encountered unexpectedly. |
Error | A type error was detected and recorded. |
Known serialization issue: The snapshot printer currently renders all three of Missing, Error, and BuiltinUnknown as the string unknown. This is a serialization bug (not a representation bug). Internally they are distinct. When debugging, if you see unknown in snapshot output, investigate which variant it actually is.
Debugging heuristic: In snapshot test output, search for unknown. If the code has no compilation errors, every unknown should correspond to a genuine BuiltinUnknown (e.g., from a standard library function that intentionally accepts unknown). Any unexpected unknown is a bug that needs investigation.
BAML has three loop forms: C-style for, iterator-style for, and while. In the MIR, all three become a single loop construct — there is no difference at the CFG level.
Why they remain separate in the AST: Consider what happens if you desugar a C-style for (let i = 0; i < arr.length(); i++) into an iterator-style for (let item in arr) at the AST level. You would synthesize an imaginary iterator variable. If the iteration target is non-iterable, the type error would reference this synthesized variable that the user never wrote. The error message would be confusing and unhelpful.
By keeping three distinct AST forms, each loop variant can produce type errors that reference the actual user-written syntax. The MIR then unifies them after diagnostics have been emitted.
General principle: Before desugaring any construct, ask yourself: "What error messages does each form produce? Do those error messages still make sense after desugaring?" If desugaring would produce confusing diagnostics, keep the forms separate in the AST and unify in the MIR.
When performing CST→AST desugaring, you must preserve span information on every generated node. Every synthesized AST node must carry the source span of the CST construct it was derived from.
If you fail to do this:
If you find yourself hacking in incorrect spans (e.g., using a dummy span or the wrong source location), stop and ask another team member whether the approach is correct. Incorrect spans are a persistent source of subtle bugs.
Crate: baml_builtins2
Source: baml_builtins2/baml_std/baml/
The BAML standard library is written in BAML itself (with some Rust-backed builtins marked with $rust_type and $rust_io_function). It defines core types, container types, LLM infrastructure, HTTP types, error types, math/string/net utilities, and more.
Key files:
core.baml — Core typescontainers.baml — Generic Array<T>, Map<K,V>, etc.ns_llm/llm.baml — LLM types and client infrastructurens_llm/llm_types.baml — Client, PrimitiveClient, PrimitiveClientOptions, RetryPolicy, etc.ns_http/http.baml — Request, Responsens_errors/errors.baml — Error typesAdding to the standard library: If you want to make new functions or types available to users, the standard library is the primary mechanism. You add BAML source files, and they compile through the normal pipeline. The standard library package (baml) is resolved first and is available to all other packages.
Caution: Standard library additions pollute the user's namespace. Be deliberate about what you add. Prefer putting things in sub-namespaces (e.g., baml.llm, baml.http) rather than at the root.
For agents: When implementing new language features, prefer adding new types and functions to the standard library rather than introducing new compiler-internal types. The type system should not be impacted unless something is truly unrepresentable with existing types.
Crate: baml_tests
The snapshot test infrastructure is the primary debugging tool for the compiler2 pipeline. Each pipeline stage has its own snapshot format:
| Stage | What the snapshot shows |
|---|---|
| HIR | Scope tree, name bindings, declarations, capture information, lambda definitions |
| TIR | Every expression annotated with its inferred type (similar to IDE inlay hints) |
| MIR | Control flow graph with basic blocks, statements, terminators, local declarations |
| Emit | Bytecode disassembly |
How to use snapshots for debugging:
baml_test! macro.cargo test — the snapshot is generated/updated.unknown — any unexpected unknown is a bug.This debugging loop is highly effective for coding agents. Agents can write test cases, read snapshot output, identify issues, and iterate. The snapshot format was designed specifically to be readable by both humans and LLMs.
Test macro:
baml_test!("baml source code here")
// Or with options:
baml_test! {
baml: "source",
entry: "func_name",
args: { "x" => val },
opt: OptLevel::Zero,
}
Do not add TextRange or span fields to your data structures. There is a dedicated mechanism for associating spans with nodes. If you add TextRange directly to a data structure, you break Salsa incrementality for everything downstream — a change to whitespace (which changes spans but not semantics) will unnecessarily invalidate all dependent queries.
Use expression IDs and the span lookup infrastructure instead. If you're unsure how to associate span information with a new construct, ask before implementing.
BAML supports mutable variables. You can reassign variables (x = newValue), use compound assignment operators (i += 1, x -= 1, etc.), and mutate data structures via methods like .push(). The MIR models this through Assign and AssignOp statements, and mutable variables captured by lambdas are wrapped in cells (indirection pointers) so that inner and outer scopes can mutate the same value.
The TIR implements bidirectional type checking, which means it switches between two modes at well-defined boundaries.
No expectation from the caller. The type is computed purely from the expression's structure. Used for: literals, variable references, field access, untyped calls. You give the type checker an expression and it tells you what type it is.
The caller knows what type it wants and passes that expectation down. For most expression forms, checking falls through to synthesis plus a subtype assertion. But for specific forms, the expected type changes the result — this is called contextual typing.
| Site | What happens |
|---|---|
let x: Foo = <init> | Annotation provides expected type → check init against Foo (top-down) |
let x = <init> (no annotation) | Synthesize the type of init, then widen fresh literals (bottom-up) |
| Function call arguments | If param type is fully concrete → check arg against it. If param has unresolved type vars → synthesize |
return <expr> | If declared return type exists → check expr against it |
Array literal where expected = T[] | Each element is checked against T |
Map literal where expected = map<K,V> | Each key checked against K, each value against V |
Object literal where expected = SomeClass | Expression gets SomeClass type directly; field values use synthesis |
Concrete example: let x: Foo = { field: 42 }
let statement sees an annotation → expected type is Foo.Foo (top-down).Foo directly.42 inside the field is synthesized bottom-up → starts as Literal(42, Fresh).TypeScript-style control-flow narrowing. The type checker recognizes patterns like x != null, x == null, !expr, and truthiness on nullable types.
For if (x != null) { ... } else { ... }:
x is narrowed to remove null.x is narrowed to null.Guard clause pattern: After if (x == null) { return; }, the then-branch diverges (type never). The type checker permanently applies the else narrowing for the rest of the block — so x is non-nullable from that point forward.
Present: fresh/regular literal types, never as bottom, unknown as top, structural typing, union types, void, equirecursive recursive types, control-flow narrowing, bidirectional checking.
Absent: intersection types, conditional types (T extends U ? A : B), mapped types, infer keyword, discriminated union contextual decomposition (checking against a union doesn't pick a member to check against — it synthesizes and subtype-checks).
Unions are represented as Ty::Union(Vec<Ty>) — a plain vector with no deduplication or sorting at construction.
Ty::Optional(Box<Ty>) is a separate variant from Union. They are not auto-rewritten into each other. The relationship is defined only at the subtype level.
Both types are first normalized (all aliases expanded), then structural subtyping runs:
T <: Union(A, B, ...) (the "right union" rule): A type is a subtype of a union if it's a subtype of any member.Union(T1, T2) <: U (the "left union" rule): A union is a subtype of something if all members are subtypes of it.Optional(T) <: Union(types): Requires null to be in the union AND T to be a subtype of some member.null <: Optional(T), T <: Optional(T), never is bottom, unknown is top, int <: float, enum variants are subtypes of their enum, list/map are covariant, functions are contravariant in parameters.When combining branch types (e.g., if/else), the type checker does flat deduplication only. No simplification of Union(T, never), no removal of subtypes (e.g., Union(int, float) stays as-is). Normalization happens on-demand at subtype-check time and does not write back.
When type-checking a match expression:
true/false, enums require all variants, optionals require the inner type's cases plus null, unions require the union of all members' required cases.NonExhaustiveMatch error. Full coverage → the match is marked as exhaustive.A user writes type JSON = string | int | bool | null | JSON[] | map<string, JSON>. The type's body references itself. The compiler must detect the cycle, decide if it's valid, and perform subtype checking without infinite loops.
At HIR time, type aliases store raw name references. type JSON = ... | JSON[] stores a TypeExpr with a path reference to "JSON". No attempt to resolve or detect cycles.
At TIR time, the path reference becomes an opaque Ty::TypeAlias — never automatically expanded. The alias body still references itself via this opaque handle.
The type checker runs two passes:
Pass 1 — Which aliases are recursive? A DFS walks through the alias map, following all type constructors. Any alias found in a cycle is marked recursive.
Pass 2 — Which cycles are valid? The dependency graph is analyzed where edges are classified as structural (through List or Map) or non-structural (through Optional, Union, or direct reference). For each strongly connected component, if any intra-SCC edge is structural, the cycle is valid. If no structural edges exist, the cycle is invalid.
The intuition: List and Map provide a construction base case (an empty container). Optional does not — type A = A? expands to A | null, and A still needs to be constructed.
| Definition | Valid? | Why |
|---|---|---|
type A = A | Invalid | Direct self-reference, no structural edge |
type A = A? | Invalid | Optional is not structural |
type A = A | string | Invalid | Union is not structural |
type A = A[] | Valid | Goes through List (structural) |
type JSON = string | int | JSON[] | map<string, JSON> | Valid | Both back-edges go through List and Map |
type A = B[], type B = A | Valid | A→B goes through List (structural) |
type A = B?, type B = A | Invalid | A→B goes through Optional (not structural) |
Class cycles use the same approach: a dependency edge is added when a field is not behind Optional/List/Map. Any SCC found is unconditionally invalid.
When subtype checking encounters a recursive alias, the normalizer produces a mu type: Mu { var: "JSON", body: Union([String, Int, ..., List(TyVar("JSON"))]) }. This is the standard type-theory mu-binder — "the type where var in body stands for this whole type."
Subtype checking uses equirecursive co-induction: before recursing into a pair (sub, sup), the pair is inserted into an assumptions set. If the same pair is encountered again during recursive checking, it returns true immediately (the co-inductive assumption). If the overall check succeeds, the assumption is validated. Mu types are unfolded by substituting every TyVar(var) with the full Mu type, then continuing the check.
Why equirecursive (not isorecursive)? In isorecursive typing, mu X.T and its unfolding are different types requiring explicit fold/unfold coercions. Since BAML users write types naturally and expect transparent alias expansion, equirecursive is the practical choice.
The Salsa query model has one critical optimization beyond basic memoization: early cutoff. When a tracked query re-runs but produces the same result as before, Salsa stops propagating invalidation to downstream dependents.
Every item is physically split into two tracked queries: one for semantic data (span-free), one for source maps (spans only). For example, function_signature returns names and TypeExprs with no TextRange, while function_signature_source_map returns only spans. The type checker reads the semantic query but never the source map query.
Items are keyed by position-independent IDs — a hash of the item's name, not its position in the file. Adding a blank line before function Greet(...) doesn't change the hash of "Greet", so the Salsa query key stays the same and cached results survive.
User adds // comment to file_a.baml. File B is untouched.
file_a.text is marked changed.file_semantic_index(file_a) re-runs (it's marked no_eq, so always reports "changed").namespace_items(user_root) re-runs — re-collects contributions from all files. But the result is identical: same names, same definition handles. Its PartialEq returns true. Early cutoff fires.package_items — NOT re-run (its dependency didn't change).infer_scope_types for any scope — NOT re-run.file_semantic_index(file_b) — NOT re-run (its input file_b.text is unchanged).Result: a comment addition re-runs the lexer and HIR for that one file, then stops. A whitespace edit shifts spans but leaves TypeExpr trees identical → function_signature early-cuts → type inference stays cached.
The standard library (baml_std) uses two separate paths: one for the compiler and one for runtime. Understanding both is important because they share source files but consume them differently.
The .baml stub files in baml_builtins2/baml_std/baml/ are embedded at compile time via include_str!. They are injected into the compiler as a Salsa input (Compiler2ExtraFiles), separate from the Project input that carries user files. The HIR query compiler2_all_files unions user files with builtin files. From that point on, builtin functions are type-checked, lowered, and compiled exactly like user-written functions — no special-casing.
At Rust build time (build.rs), the same .baml stub files are lexed, parsed, and lowered to AST. Every function with a $rust_function or $rust_io_function body is collected into a record. From these records, three things are generated:
BamlClassArray with a method per array builtin). These mirror the namespace structure.SysOp enum — One variant per I/O builtin, used for async dispatch.A concrete struct (PackageBamlImpl) implements all generated traits. At program load time, the VM walks all functions in the compiled program. For each NativeUnresolved function, it calls get_native_fn(name) to look up the Rust function pointer. At call time, the VM invokes the function pointer directly.
When you add a new builtin function to the standard library, you are touching both paths. The .baml file defines the signature and body marker. The compiler path type-checks it. The build.rs codegen path generates a trait method for it. And you must implement that trait method in Rust. The two paths share the same source of truth (the .baml files) but consume it independently.
The test infrastructure generates one snapshot per pipeline phase per test project. Each phase captures a different layer's output:
| Phase | Name | What it snapshots |
|---|---|---|
01 | lexer | Token stream |
02 | parser | CST + parse errors |
03 | hir | Scope tree, item tree, symbol contributions |
04 | tir | Typed expressions, resolved names |
04_5 | mir | Control flow graphs |
05 | diagnostics | All diagnostics aggregated across phases |
06 | codegen | Bytecode |
10 | formatter | Formatter idempotency (format twice, assert identical) |
Phases 01 and 02 run per-file. Phases 03–06 run per-project (loading all files together). Snapshots are stored alongside the test projects.
.baml files in the test projects area.cargo test — the build script picks up new directories automatically.cargo insta accept --all to commit initial snapshots.Separate from snapshot tests, there are targeted incremental tests that verify Salsa's early-cutoff behavior. These wrap the project database with an event log that records WillExecute events, then assert exact execution counts. They verify things like:
When implementing a new feature, walk through these questions in order:
When in doubt: put it in the AST layer. Most features live there. The AST is the workhorse of the compiler.
When talking to coding agents: Tell the agent which layer to operate in. This dramatically improves one-shot accuracy. Agents that understand the layer boundaries produce correct code more reliably than agents given free rein to modify any layer.
| Layer | Crate | Transforms? | Salsa Queries? | Can construct new nodes? |
|---|---|---|---|---|
| Parser/CST | baml_compiler_parser | Yes (text → CST) | syntax_tree | Yes |
| AST | baml_compiler2_ast | Yes (CST → AST) | No (pure function) | Yes |
| HIR | baml_compiler2_hir | No | file_semantic_index, namespace_items, package_items | No |
| PPIR | baml_compiler2_ppir | Yes (synthesizes stream types, feeds back to HIR) | ppir_expansion_items | Yes (synthetic stream items only) |
| TIR | baml_compiler2_tir | No | infer_scope_types, resolve_name_at | No |
| MIR | baml_compiler2_mir | Yes (AST → CFG) | lower_function, lower_let_body | Yes |
| Emit | baml_compiler2_emit | Yes (MIR → bytecode) | generate_project_bytecode | Yes (bytecode) |