docs/regexp/architecture.md
This document describes the architecture of V8's regular expression engine, Irregexp.
Irregexp is a highly optimized regular expression engine. It does not use a simple backtracking NFA/DFA interpreter by default (though an experimental non-backtracking engine exists). Instead, it compiles regular expressions either directly into native machine code or into a specialized bytecode that is executed by a dedicated interpreter.
Irregexp has its own complete mini-compiler pipeline:
Parser (src/regexp/regexp-parser.h) parses the regular expression pattern string into a specialized Abstract Syntax Tree (AST) defined in src/regexp/regexp-ast.h. Note that AST classes are not prefixed with RegExp (e.g., Tree, Disjunction).Node objects (src/regexp/regexp-nodes.h). This graph represents the state machine for matching. Similar to AST classes, Node classes are not prefixed with RegExp (e.g., Node, TextNode).eats_at_least (minimum characters needed to match) and setting up Boyer-Moore lookahead tables for fast scanning.Compiler (src/regexp/regexp-compiler.h) traverses the graph and generates code using RegExpMacroAssembler.
Irregexp can execute in two main modes, defined by regexp::CompilationTarget:
kNative: The engine generates machine code. This provides the fastest execution but consumes more memory for the generated code.kBytecode: The engine generates bytecode. This is interpreted by IrregexpInterpreter (src/regexp/regexp-interpreter.h). This saves memory and is useful on platforms where JIT compilation is not allowed or for less frequently used regexps.Trace objects to track the current state (character position, deferred actions, backtracking stack) to generate efficient code paths and avoid redundant work.src/regexp/: Core RegExp implementation.src/regexp/regexp-parser.h / .cc: Parser.src/regexp/regexp-ast.h: AST definitions.src/regexp/regexp-compiler.h / .cc: Compiler.src/regexp/regexp-interpreter.h / .cc: Bytecode interpreter.src/regexp/regexp-nodes.h: Graph node definitions.