Back to V8

Assembler and MacroAssembler Architecture

docs/codegen/assembler-architecture.md

15.0.103.3 KB
Original Source

Assembler and MacroAssembler Architecture

This document explains V8's low-level code generation abstractions: the Assembler and MacroAssembler. These are used by all compilers (Sparkplug, Maglev, TurboFan) and hand-written builtins to emit machine code.

The Layered Approach

V8 uses a multi-layered approach to code generation to balance performance, platform independence, and ease of use.

1. AssemblerBase and Assembler

At the lowest level is the Assembler (defined per architecture, e.g., src/codegen/x64/assembler-x64.h).

  • AssemblerBase: Provides common functionality like managing the instruction buffer, relocation information (reloc info), and code comments.
  • Assembler: Implements the actual instruction encoding for a specific architecture. For example, on x64, calling movq(rax, rbx) directly emits the corresponding bytes into the buffer.
  • Role: It is a 1:1 mapping to machine instructions. It does not perform any abstraction or simplification.

2. MacroAssembler

The MacroAssembler (also defined per architecture, e.g., src/codegen/x64/macro-assembler-x64.h) inherits from Assembler (via MacroAssemblerBase and often via a shared intermediate class like SharedMacroAssembler for related architectures).

  • MacroAssemblerBase: Contains platform-independent macro-assembler functionality (e.g., loading constants, root-relative addressing).
  • Role: It provides higher-level "macro" instructions and common idioms that are composed of multiple raw instructions.
  • Abstractions:
    • Runtime Calls: CallRuntime(Runtime::k...) handles the complexity of setting up arguments and calling a C++ runtime function.
    • Frame Setup: Methods to push/pop standard frame headers.
    • Pseudo-instructions: Instructions that might not exist directly on the hardware but are common (e.g., Push on architectures that don't have a dedicated push instruction).

How They Are Used

In Compilers

Compilers like Maglev use the MacroAssembler in their GenerateCode methods.

cpp
void Int32AddWithOverflow::GenerateCode(MaglevAssembler* masm, const ProcessingState& state) {
  Register left = ToRegister(LeftInput());
  Register right = ToRegister(RightInput());
  __ addl(left, right); // Raw assembler call via MacroAssembler
  __ EmitEagerDeoptIf(overflow, DeoptimizeReason::kOverflow, this); // MacroAssembler helper
}

(Note: MaglevAssembler inherits from MacroAssembler and adds Maglev-specific helpers).

In Builtins

Hand-written assembler builtins are written directly using the MacroAssembler.

The Instruction Buffer and Relocation

The Assembler writes bytes into a growable buffer. Because code often contains references to objects or labels that are not yet fixed in memory, V8 uses Relocation Information (RelocInfo).

  • When the assembler emits a reference to a HeapObject or another code chunk, it records the position and the type of reference.
  • When the code is finalized and moved to its permanent location in the heap, V8 "relocates" these references, patching the code with the actual memory addresses.

See Also

  • src/codegen/assembler.h: Base class definitions.
  • src/codegen/x64/assembler-x64.h: x64 specific assembler.
  • src/codegen/x64/macro-assembler-x64.h: x64 specific macro-assembler.