doc/WebAssembly.md
Halide supports WebAssembly (Wasm) code generation from Halide using the LLVM backend.
As WebAssembly itself is still under active development, Halide's support has some limitations. Some of the most important:
Note that for all of the above, earlier versions might work, but have not been tested.
Halide outputs a Wasm object (.o) or static library (.a) file, much like any
other architecture; to use it, of course, you must link it to suitable calling
code. Additionally, you must link to something that provides an implementation
of libc; as a practical matter, this means using the Emscripten tool to do
your linking, as it provides the most complete such implementation we're aware
of at this time.
EMSDK environment variable set
properly.It's important to reiterate that the WebAssembly JIT mode is not (and will never be) appropriate for anything other than limited self tests, for a number of reasons:
define_extern calls require copying all halide_buffer_t data across the
Wasm<->host boundary in both directions. This has severe implications for
existing benchmarks, which don't currently attempt to account for this extra
overhead. (This could possibly be improved by modeling the Wasm JIT's buffer
support as a device model that would allow lazy copy-on-demand.)define_extern or HalideExtern cannot accept or
return values that are pointer types or 64-bit integer types; this includes
things like const char * and user_context. Fixing this is tractable, but
is currently omitted as the fix is nontrivial and the tests that are affected
are mostly non-critical. (Note that halide_buffer_t* is explicitly supported
as a special case, however.)parallel() schedules will be run
serially..async() directive isn't supported at all, not even in serial-emulation
mode.Param<void *> (or any other arbitrary pointer type) with the
Wasm jit.Func.debug_to_file(), Func.set_custom_do_par_for(),
Func.set_custom_do_task(), or Func.set_custom_allocator().malloc() used by the JIT is incredibly simpleminded
and unsuitable for anything other than the most basic of tests.Note that while some of these limitations may be improved in the future, some are effectively intrinsic to the nature of this problem. Realistically, this JIT implementation is intended solely for running Halide self-tests (and even then, a number of them are fundamentally impractical to support in a hosted-Wasm environment and are disabled).
In sum: don't plan on using Halide JIT mode with Wasm unless you are working on the Halide library itself.
There is experimental support for using V8 as the interpreter in JIT mode,
rather than WABT. This is enabled by the CMake command line options
-DWITH_V8=ON -DWITH_WABT=OFF (only one of them can be used at a time). You
must build V8 locally V8, then specify the path to the library and headers as
CMake options. This is currently only tested on x86-64-Linux and requires v8
version 9.8.177 as a minimum.
The canonical instructions to build V8 are at v8.dev, and there are examples for embedding v8. The process for Halide is summarized below.
depot_tools$ gclient
$ mkdir ~/v8 && cd ~/v8
$ fetch v8
$ cd ~/v8/v8
$ git checkout origin/9.8.177
tools/dev/v8gen.py x64.release.sampleecho 'v8_enable_pointer_compression = false' >> out.gn/x64.release.sample/args.gnecho 'v8_enable_gdbjit = false' >> out.gn/x64.release.sample/args.gnautoninja -C out.gn/x64.release.sample v8_monolithWith V8 built, we can pass the CMake options:
V8_INCLUDE_DIR, path to V8 includes, e.g. $HOME/v8/v8/includeV8_LIBRARY, path to V8 static library, e.g.
$HOME/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.aAn example to configure Halide with V8 support, build and run an example test:
$ cd /path/to/halide
$ export HL_TARGET=wasm-32-wasmrt-wasm_simd128
$ export HL_JIT_TARGET=${HL_TARGET}
$ cmake -G Ninja \
-DWITH_WABT=OFF \
-DWITH_V8=ON \
-DV8_INCLUDE_DIR=$HOME/v8/v8/include \
-DV8_LIBRARY=$HOME/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.a \
-DHalide_TARGET=${HL_TARGET} \
/* other cmake settings here as appropriate */
$ cmake --build .
$ ctest -L "correctness|generator" -j
"all") then it's already present, but otherwise, add it explicitly:-DLLVM_TARGETS_TO_BUILD="X86;ARM;NVPTX;AArch64;PowerPC;Hexagon;WebAssembly
If you want to run test_correctness and other interesting parts of the Halide
test suite (and you almost certainly will), you'll need to ensure that LLVM is
built with wasm-ld:
cmake -DLLVM_ENABLE_PROJECTS="clang;lld" ...
HL_JIT_TARGET=wasm-32-wasmrt (possibly adding
wasm_simd128) and run CMake/CTest normally. Note that wasm testing is only
supported under CMake (not via Make).If you want to test ahead-of-time code generation (and you almost certainly will), you need to install Emscripten locally.
The simplest way to install is probably via the Emscripten emsdk (https://emscripten.org/docs/getting_started/downloads.html).
To run the AOT tests, set HL_TARGET=wasm-32-wasmrt (possibly adding
wasm_simd128) and run CMake/CTest normally. Note that wasm testing is only
supported under CMake (not via Make).
The test_performance benchmarks are misleading (and thus useless) for Wasm, as
they include JIT overhead as described elsewhere. Suitable benchmarks for Wasm
will be provided at a later date. (See
https://github.com/halide/Halide/issues/5119 and
https://github.com/halide/Halide/issues/5047 to track progress.)
You can use the wasm_threads feature to enable use of a normal pthread-based
thread pool in Halide code, but with some careful caveats:
-pthread flag, and compiling for a Web environment). In this configuration,
Emscripten goes to great lengths to make WebWorkers available via the pthreads
API. (You can see an example of this usage in apps/HelloWasm.) Note that not
all wasm runtimes support WebWorkers; generally, you need a full browser
environment to make this work (though some versions of some shell tools may
also support this, e.g. nodejs).wasm-ld tool into libHalide; with
some work this need could possibly be eliminated.apps/ folder has been investigated yet. Many of them should be
supportable with some work. (Patches welcome.)copy_to_device() would copy from host
-> wasm); this would make the performance benchmarks much more useful.