docs/parsing/parser-and-ast.md
Parsing is the process of taking JavaScript source code and turning it into a structured representation that the engine can understand and execute. V8 uses a recursive descent parser to create an Abstract Syntax Tree (AST).
The process involves several stages:
Scanner reads the raw character stream of the source code and breaks it down into a sequence of tokens (e.g., Token::kIf, Token::kIdentifier, Token::kSmi). Tokens are defined in src/parsing/token.h.Parser consumes the tokens and builds the Abstract Syntax Tree (AST) according to the grammar of JavaScript.The Scanner (defined in src/parsing/scanner.h) is responsible for breaking the raw character stream into tokens. It is highly optimized as it is the cornerstone of parser performance.
Key optimizations and design choices include:
ID_Start and ID_Continue properties without expensive Unicode checks.if, else, function), V8 uses a perfect hash function generated by gperf based on the length and the first two characters of the identifier.AdvanceUntil: A templatized helper that allows the scanner direct access to the underlying stream data in a tight loop, minimizing function call overhead.To improve startup time and reduce memory usage, V8 does not parse all JavaScript code eagerly.
The PreParser is a stripped-down version of the parser.
To share implementation details between the full Parser and the PreParser without virtual function overhead, V8 uses a C++ template base class, ParserBase (defined in src/parsing/parser-base.h), using the Curiously Recurring Template Pattern (CRTP). Both Parser and PreParser inherit from ParserBase passing themselves as the template argument.
The AST is a tree structure where each node represents a construct in the source code. AST nodes are defined in src/ast/ast.h. The AST is the final product of the parser and serves as the input for bytecode generation by Ignition (see Ignition).
Key Node Types:
FunctionLiteral: Represents a function definition. It is the root of the AST for a function.Block: A sequence of statements.ExpressionStatement: A statement that consists of a single expression.Literal: Represents a constant value (e.g., a number, string, or boolean).VariableProxy: Represents a reference to a variable. It is resolved to a Variable in a specific Scope during scope analysis.During or immediately after parsing, V8 performs scope analysis to resolve variable references to their declarations. This is where the relationships documented in Scopes and ScopeInfos are established.