docs/contributing-architecture.md
This document is intended to give an overview of the implementation of Wasmtime.
This will explain the purposes of the various wasmtime-* crates that the main
wasmtime crate depends on. For even more detailed information it's recommended
to review the code itself and find the comments contained within.
wasmtime crateThe main entry point for Wasmtime is the wasmtime crate itself. Wasmtime is
designed such that the wasmtime crate is nearly a 100% safe API (safe in the
Rust sense) modulo some small and well-documented functions as to why they're
unsafe. The wasmtime crate provides features and access to WebAssembly
primitives and functionality, such as compiling modules, instantiating them,
calling functions, etc.
At this time the wasmtime crate is the first crate that is intended to be
consumed by users. First in this sense means that everything wasmtime depends
on is thought of as an internal dependency. We publish crates to crates.io but
put very little effort into having a "nice" API for internal crates or worrying
about breakage between versions of internal crates. This primarily means that
all the other crates discussed here are considered internal dependencies of
Wasmtime and don't show up in the public API of Wasmtime at all. To use some
Cargo terminology, all the wasmtime-* crates that wasmtime depends on are
"private" dependencies.
Additionally at this time the safe/unsafe boundary between Wasmtime's internal
crates is not the most well-defined. There are methods that should be marked
unsafe which aren't, and unsafe methods do not have exhaustive documentation
as to why they are unsafe. This is an ongoing matter of improvement, however,
where the goal is to have safe methods be actually safe in the Rust sense,
as well as having documentation for unsafe methods which clearly lists why
they are unsafe.
To preface discussion of more nitty-gritty internals, it's important to have a few concepts in the back of your head. These are some of the important types and their implications in Wasmtime:
wasmtime::Engine - this is a global compilation context which is sort of the
"root context". An Engine is typically created once per program and is
expected to be shared across many threads (internally it's atomically
reference counted). Each Engine stores configuration values and other
cross-thread data such as type interning for Module instances. The main
thing to remember for Engine is that any mutation of its internals typically
involves acquiring a lock, whereas for Store below no locks are necessary.
wasmtime::Store - this is the concept of a "store" in WebAssembly. While
there's also a formal definition to go off of, it can be thought of as a bag
of related WebAssembly objects. This includes instances, globals, memories,
tables, etc. A Store does not implement any form of garbage collection of
the internal items (there is a gc function but that's just for externref
values). This means that once you create an Instance or a Table the memory
is not actually released until the Store itself is deallocated. A Store is
sort of a "context" used for almost all wasm operations. Store also contains
instance handles which recursively refer back to the Store, leading to a
good bit of aliasing of pointers within the Store. The important thing for
now, though, is to know that Store is a unit of isolation. WebAssembly
objects are always entirely contained within a Store, and at this time
nothing can cross between stores (except scalars if you manually hook it up).
In other words, wasm objects from different stores cannot interact with each
other. A Store cannot be used simultaneously from multiple threads (almost
all operations require &mut self).
wasmtime::runtime::vm::InstanceHandle - this is the low-level representation of a
WebAssembly instance. At the same time this is also used as the representation
for all host-defined objects. For example if you call wasmtime::Memory::new
it'll create an InstanceHandle under the hood. This is a very unsafe type
that should probably have all of its functions marked unsafe or otherwise
have more strict guarantees documented about it, but it's an internal type
that we don't put much thought into for public consumption at this time. An
InstanceHandle doesn't know how to deallocate itself and relies on the
caller to manage its memory. Currently this is either allocated on-demand
(with malloc) or in a pooling fashion (using the pooling allocator). The
deallocate method is different in these two paths (as well as the
allocate method).
An InstanceHandle is laid out in memory with some Rust-owned values first
capturing the dynamic state of memories/tables/etc. Most of these fields are
unused for host-defined objects that serve one purpose (e.g. a
wasmtime::Table::new), but for an instantiated WebAssembly module these
fields will have more information. After an InstanceHandle in memory is a
VMContext, which will be discussed next. InstanceHandle values are the
main internal runtime representation and what the crate::runtime::vm code
works with. The wasmtime::Store holds onto all these InstanceHandle values
and deallocates them at the appropriate time. From the runtime perspective it
simplifies things so the graph of wasm modules communicating to each other is
reduced to simply InstanceHandle values all talking to themselves.
crate::runtime::vm::VMContext - this is a raw pointer, within an allocation of
an InstanceHandle, that is passed around in JIT code. A VMContext does not
have a structure defined in Rust (it's a 0-sized structure) because its
contents are dynamically determined based on the VMOffsets, or the source
wasm module it came from. Each InstanceHandle has a "shape" of a VMContext
corresponding with it. For example a VMContext stores all values of
WebAssembly globals, but if a wasm module has no globals then the size of this
array will be 0 and it won't be allocated. The intention of a VMContext is
to be an efficient in-memory representation of all wasm module state that JIT
code may access. The layout of VMContext is dynamically determined by a
module and JIT code is specialized for this one structure. This means that the
structure is efficiently accessed by JIT code, but less efficiently accessed
by native host code. A non-exhaustive list of purposes of the VMContext is
to:
VMExternRefActivationsTable for fast-path insertion of
externref values into the table.*mut dyn crate::runtime::vm::Store so store-level
operations can be performed in libcalls.A comment about the layout of a VMContext can be found in the vmoffsets.rs
file.
wasmtime::Module - this is the representation of a compiled WebAssembly
module. At this time Wasmtime always assumes that a wasm module is always
compiled to native JIT code. Module holds the results of said compilation,
and currently Cranelift can be used for compiling. It is a goal of
Wasmtime to support other modes of representing modules but those are not
implemented today just yet, only Cranelift is implemented and supported.
wasmtime_environ::Module - this is a descriptor of a wasm module's type and
structure without holding any actual JIT code. An instance of this type is
created very early on in the compilation process, and it is not modified when
functions themselves are actually compiled. This holds the internal type
representation and state about functions, globals, etc. In a sense this can be
thought of as the result of validation or typechecking a wasm module, although
it doesn't have information such as the types of each opcode or minute
function-level details like that.
With a high-level overview and some background information of types, this will
next walk through the steps taken to compile a WebAssembly module. The main
entry point for this is the wasmtime::Module::from_binary API. There are a
number of other entry points that deal with surface-level details like
translation from text-to-binary, loading from the filesystem, etc.
Compilation is roughly broken down into a few phases:
First compilation walks over the WebAssembly module validating everything
except function bodies. This synchronous pass over a wasm module creates a
wasmtime_environ::Module instance and additionally prepares for function
compilation. Note that with the module linking proposal one input module may
end up creating a number of output modules to process. Each module is
processed independently and all further steps are parallelized on a
per-module basis. Note that parsing and validation of the WebAssembly module
happens with the wasmparser crate. Validation is interleaved with parsing,
validating parsed values before using them.
Next all functions within a module are validated and compiled in parallel. No inter-procedural analysis is done and each function is compiled as its own little island of code at this time. This is the point where the meat of Cranelift is invoked on a per-function basis.
The compilation results at this point are all woven into a
wasmtime_jit::CompilationArtifacts structure. This holds module information
(wasmtime_environ::Module), compiled JIT code (stored as an ELF image), and
miscellaneous other information about functions such as platform-agnostic
unwinding information, per-function trap tables (indicating which JIT
instructions can trap and what the trap means), per-function address maps
(mapping from JIT addresses back to wasm offsets), and debug information
(parsed from DWARF information in the wasm module). These results are inert
and can't actually be executed, but they're appropriate at this point to
serialize to disk or begin the next phase...
The final step is to actually place all code into a form that's ready to get
executed. This starts from the CompilationArtifacts of the previous step.
Here a new memory mapping is allocated and the JIT code is copied into this
memory mapping. This memory mapping is then switched from read/write to
read/execute so it's actually executable JIT code at this point. This is
where various hooks like loading debuginfo, informing JIT profilers of new
code, etc, all happens. At this point a wasmtime_jit::CompiledModule is
produced and this is itself wrapped up in a wasmtime::Module. At this
point the module is ready to be instantiated.
A wasmtime::Module is an atomically-reference-counted object where upon
instantiation into a Store, the Store will hold a strong reference to the
internals of the module. This means that all instances of a wasmtime::Module
share the same compiled code. Additionally a wasmtime::Module is one of the
few objects that lives outside of a wasmtime::Store. This means that
wasmtime::Module's reference counting is its own form of memory management.
Note that the property of sharing a module's compiled code across all
instantiations has interesting implications on what the compiled code can
assume. For example Wasmtime implements a form of type interning, but the
interned types happen at a few different levels. Within a module we deduplicate
function types, but across modules in a Store types need to be represented
with the same value. This means that if the same module is instantiated into
many stores its same function type may take on many values, so the compiled
code can't assume a particular value for a function type. (more on type
information later). The general gist though is that compiled code leans
relatively heavily on the VMContext for contextual input because the JIT code
is intended to be so widely reusable.
An important aspect to also cover for compilation is the creation of trampolines. Trampolines in this case refer to code executed by Wasmtime to enter WebAssembly code. The host may not always have prior knowledge about the signature of the WebAssembly function that it wants to call. Wasmtime JIT code is compiled with native ABIs (e.g. params/results in registers according to System V on Unix), which means that a Wasmtime embedding doesn't have an easy way to enter JIT code.
This problem is what the trampolines compiled into a module solve, which is to
provide a function with a known ABI that will call into a function with a
specific other type signature/ABI. Wasmtime collects all the exported functions
of a module and creates a set of their type signatures. Note that exported in
this context actually means "possibly exported" which includes things like
insertion into a global/function table, conversion to a funcref, etc. A
trampoline is generated for each of these type signatures and stored along with
the JIT code for the rest of the module.
These trampolines are then used with the wasmtime::Func::call API where in
that specific case because we don't know the ABI of the target function the
trampoline (with a known ABI) is used and has all the parameters/results passed
through the stack.
Another point of note is that trampolines are not deduplicated at this time. Each compiled module contains its own set of trampolines, and if two compiled modules have the same types then they'll have different copies of the same trampoline.
VMSharedSignatureIndexOne important point to talk about with compilation is the
VMSharedSignatureIndex type and how it's used. The call_indirect opcode in
wasm compares an actual function's signature against the function signature of
the instruction, trapping if the signatures mismatch. This is implemented in
Wasmtime as an integer comparison, and the comparison happens on a
VMSharedSignatureIndex value. This index is an intern'd representation of a
function type.
The scope of interning for VMSharedSignatureIndex happens at the
wasmtime::Engine level. Modules are compiled into an Engine. Insertion of a
Module into an Engine will assign a VMSharedSignatureIndex to all of the
types found within the module.
The VMSharedSignatureIndex values for a module are local to that one
instantiation of a Module (and they may change on each insertion of a
Module into a different Engine). These are used during the instantiation
process by the runtime to assign a type ID effectively to all functions for
imports and such.
Once a module has been compiled it's typically then instantiated to actually
get access to the exports and call wasm code. Instantiation always happens
within a wasmtime::Store and the created instance (plus all exports) are tied
to the Store.
Instantiation itself (crates/wasmtime/src/instance.rs) may look complicated,
but this is primarily due to the implementation of the Module Linking proposal.
The rough flow of instantiation looks like:
First all imports are type-checked. The provided list of imports is
cross-referenced with the list of imports recorded in the
wasmtime_environ::Module and all types are verified to line up and match
(according to the core wasm specification's definition of type matching).
Each wasmtime_environ::Module has a list of initializers that need to be
completed before instantiation is finished. For MVP wasm this only involves
loading the import into the correct index array, but for module linking this
could involve instantiating other modules, handling alias fields, etc. In
any case the result of this step is a crate::runtime::vm::Imports array
which has the values for all imported items into the wasm module. Note that
in this case an import is typically some sort of raw pointer to the actual
state plus the VMContext of the instance that was imported from. The final
result of this step is an InstanceAllocationRequest, which is then
submitted to the configured instance allocator, either on-demand or pooling.
The InstanceHandle corresponding to this instance is allocated. How this
is allocated depends on the strategy (malloc for on-demand, slab allocation
for pooling). In addition to initialization of the fields of InstanceHandle
this also initializes all the fields of the VMContext for this handle
(which as mentioned above is adjacent to the InstanceHandle allocation
after it in memory). This does not process any data segments, element
segments, or the start function at this time.
At this point the InstanceHandle is stored within the Store. This is
the "point of no return" where the handle must be kept alive for the same
lifetime as the Store itself. If an initialization step fails then the
instance may still have had its functions, for example, inserted into an
imported table via an element segment. This means that even if we fail to
initialize this instance its state could still be visible to other
instances/objects so we need to keep it alive regardless.
The final step is performing wasm-defined instantiation. This involves
processing element segments, data segments, the start function, etc. Most
of this is just translating from Wasmtime's internal representation to the
specification's required behavior.
Another part worth pointing out for instantiating a module is that a
ModuleRegistry is maintained within a Store of all instantiated modules
into the store. The purpose of this registry is to retain a strong reference to
items in the module needed to run instances. This includes the JIT code
primarily but also has information such as the VMSharedSignatureIndex
registration, metadata about function addresses and such, etc. Much of this
data is stored into a GLOBAL_MODULES map for later access during traps.
Once instances have been created and wasm starts running most things are fairly standard. Trampolines are used to enter wasm and JIT code generally does what it does to execute wasm. An important aspect of the implementation to cover, however, is traps.
Wasmtime today implements traps with the support for exceptions in Cranelift. Notably the entry trampoline into WebAssembly sets up an "base handler" used to catch all traps, and when a trap happens this is resumed to. The exception handler itself takes care of, for example, restoring registers.
Traps can happen from a few different sources:
Explicit traps - these can happen when a host call returns a trap, for
example. These bottom out in raise_user_trap or raise_lib_trap, both of
which immediately call longjmp to go back to the wasm starting point. Note
that these, like when calling wasm, have to have callers be very careful to
not have any destructors on the stack.
Signals - this is the main vector for trap. Basically we use segfault and
illegal instructions to implement traps in wasm code itself. Segfaults arise
when linear memory accesses go out of bounds and illegal instructions are how
the wasm unreachable instruction is implemented. In both of these cases
Wasmtime installs a platform-specific signal handler to catch the signal,
inspect the state of the signal, and then handle it. Note that Wasmtime tries
to only catch signals that happen from JIT code itself as to not accidentally
cover up other bugs. Exiting a signal handler happens via longjmp to get
back to the original wasm call-site.
The general idea is that Wasmtime has very tight control over the stack frames of wasm (naturally via Cranelift), and just after we reenter back into wasm (aka trampolines on entry/exit).
The signal handler for Wasmtime uses the GLOBAL_MODULES map populated during
instantiation to determine whether a program counter that triggered a signal is
indeed a valid wasm trap. This should be true except for cases where the host
program has another bug that triggered the signal.
A final note worth mentioning is that Wasmtime uses the Rust backtrace crate
to capture a stack trace when a wasm exception occurs. This forces Wasmtime to
generate native platform-specific unwinding information to correctly unwind the
stack and generate a stack trace for wasm code. This does have other benefits as
well such as improving generic sampling profilers when used with Wasmtime.
Linear memory in Wasmtime is implemented effectively with mmap (or the
platform equivalent thereof), but there are some subtle nuances that are worth
pointing out here too. The implementation of linear memory is relatively
configurable which gives rise to a number of situations that both the runtime
and generated code need to handle.
First there are a number of properties about linear memory which can be configured:
wasmtime::Config::memory_reservationwasmtime::Config::memory_may_movewasmtime::Config::memory_guard_sizewasmtime::Config::memory_reservation_for_growthwasmtime::Config::memory_init_cowwasmtime::Config::guard_before_linear_memorywasmtime::Config::signals_based_trapsThe methods on Config have a good bit of documentation to go over some
nitty-gritty. Wasmtime also has some #[cfg] directives which are calculated by
crates/wasmtime/build.rs which affects the defaults of various strategies. For
example has_native_signals means that segfaults are allowed to happen at
runtime and are caught in a signal handler. Additionally has_virtual_memory
means that mmap is available and will be used (otherwise a fallback to
malloc is implemented). The matrix of all of these combinations is then used
to implement a linear memory for a WebAssembly instance.
It's generally best to consult the documentation of Config for the most
up-to-date information. Additionally code comments throughout the codebase can
also be useful for understanding the impact of some of these options. Some
example scenarios though are:
(memory 1) on 64-bit platforms - by default this WebAssembly memory has
unlimited size, meaning it's only limited by its index type (i32) meaning it
can grow up to 4GiB if the host/embedder allows it. This is implemented with a
8GiB virtual memory reservation -- 2GiB unmapped before linear memory, 4GiB
for linear memory itself (but only 1 wasm page, 64KiB, read/write at the
start), and 2GiB unmapped afterwards. The guard region before linear memory is
a defense-in-depth measure and should never be hit under any operation. The
guard region after linear memory is present to eliminate bounds checks in the
wasm module (WebAssembly addresses are effective 33-bit addresses when the
static offset is taken into account).
(memory i64 1) on 64-bit platforms - this WebAssembly memory uses 64-bit
indexes instead of 32-bit indexes. This means that the configuration looks
similar to (memory 1) above except that growth beyond 4GiB will copy all the
contents of linear memory to a new location. Embedders might want to raise
Config::memory_reservation in this situation. This configuration mode cannot
remove any bounds checks, but guard pages are still used to deduplicate bounds
checks where possible (so segfaults may still be caught at runtime for
out-of-bounds accesses).
(memory 1) on 64-bit platforms with the pooling allocator - the pooling
allocator has a few important differences than the default settings. First is
that the pooling allocator is able to "overlap" the before/after guard regions
meaning that the virtual memory cost per-linear-memory is 6GiB by default
instead of 8GiB. Additionally the pooling allocator cannot resize memory so if
Config::memory_reservation is less than 4GiB then that's a hard limit on the
size of linear memory rather than being able to copy to a new location.
(memory 1) on 64-bit platforms with a smaller reservation - if the
Config::memory_reservation option is configured less than the default (the
default is 4GiB) then the virtual memory allocated for all linear memories
will be less than the 8GiB default. This means that linear memories may move
over time if they grow beyond their initial limit (assuming such growth is
allowed) and additionally bounds checks will be required for memory accesses.
(memory 1) on 32-bit platforms - unlike 64-bit platforms this memory
cannot have a 4GiB virtual memory reservation. Instead the linear memory is
allocated with Config::memory_reservation_for_growth unmapped bytes after it
to amortize the reallocation overhead of copying bytes. Guard pages are still
used and signals are used where available to deduplicate bounds checks.
(memory 1 (pagesize 1)) on any platforms - this WebAssembly linear
memory, with a page size of 1 byte, means that virtual memory cannot be used
to catch traps. Instead explicit bounds checks are always required on all
accesses. This is still allocated with virtual memory where possible, however.
There's quite a few possible combinations for how all of these options interact
with each other. The high-level design goal of Wasmtime is such that each option
is independent from all the others and is a knob for just its behavior. In this
way it should be possible to customize the needs of embedders. Wasmtime
additionally has different default behavior across platforms, such as 32-bit and
64-bit platforms. Some platforms additionally don't have mmap by default and
Wasmtime will adapt to that as well. The intention, however, is that it should
be possible to mirror the default configuration on any platform into a
"full-featured" platform such as 64-bit to assist with testing, fuzzing, and
debugging.
externrefWebAssembly tables contain reference types, currently either funcref or
externref. A funcref in Wasmtime is represented as *mut VMCallerCheckedFuncRef and an externref is represented as VMExternRef
(which is internally *mut VMExternData). Tables are consequently represented
as vectors of pointers. Table storage memory management by default goes through
Rust's Vec which uses malloc and friends for memory. With the pooling
allocator this uses preallocated memory for storage.
As mentioned previously Store has no form of internal garbage
collection for wasm objects themselves so a funcref table in wasm is pretty
simple in that there's no lifetime management of any of the pointers stored
within, they're simply assumed to be valid for as long as the table is in use.
For tables of externref the story is more complicated. The VMExternRef is a
version of Arc<dyn Any> but specialized in Wasmtime so JIT code knows where
the offset of the reference count field to directly manipulate it is.
Furthermore tables of externref values need to manage the reference count
field themselves, since the pointer stored in the table is required to have a
strong reference count allocated to it.
externrefWasmtime implements the externref type of WebAssembly with an
atomically-reference-counted pointer. Note that the atomic part is not needed
by wasm itself but rather from the Rust embedding environment where it must be
safe to send ExternRef values to other threads. Wasmtime also does not
come with a cycle collector so cycles of host-allocated ExternRef objects
will leak.
Despite reference counting, though, a Store::gc method exists. This is an
implementation detail of how reference counts are managed while wasm code is
executing. Instead of managing the reference count of an externref value
individually as it moves around on the stack Wasmtime implements "deferred
reference counting" where there's an overly conservative list of ExternRef
values that may be in use, and periodically a GC is performed to make this
overly conservative list a precise one. This leverages the stack map support
of Cranelift plus the backtracing support of backtrace to determine live
roots on the stack. The Store::gc method forces the
possibly-overly-conservative list to become a precise list of externref
values that are actively in use on the stack.
The main Wasmtime internal crates are:
wasmtime - the safe public API of wasmtime.
wasmtime::runtime::vm - low-level runtime implementation of Wasmtime. This
is where VMContext and InstanceHandle live. This module used to be a
crate, but has since been folding into wasmtime.wasmtime-environ - low-level compilation support. This is where translation
of the Module and its environment happens, although no compilation actually
happens in this crate (although it defines an interface for compilers). The
results of this crate are handed off to other crates for actual compilation.wasmtime-cranelift - implementation of function-level compilation using
Cranelift.Note that at this time Cranelift is a required dependency of wasmtime. Most of
the types exported from wasmtime-environ use cranelift types in their API. One
day it's a goal, though, to remove the required cranelift dependency and have
wasmtime-environ be a relatively standalone crate.
In addition to the above crates there are some other miscellaneous crates that
wasmtime depends on:
wasmtime-cache - optional dependency to manage default caches on the
filesystem. This is enabled in the CLI by default but not enabled in the
wasmtime crate by default.wasmtime-fiber - implementation of stack-switching used by async support
in Wasmtimewasmtime-debug - implementation of mapping wasm dwarf debug information to
native dwarf debug information.wasmtime-profiling - implementation of hooking up generated JIT code to
standard profiling runtimes.wasmtime-obj - implementation of creating an ELF image from compiled
functions.