crates/wasi-nn/examples/classification-component-onnx/README.md
This example demonstrates how to use the wasi-nn crate to run a classification using the
ONNX Runtime backend from a WebAssembly component.
It supports CPU and GPU (Nvidia CUDA) execution targets.
Note: GPU execution target only supports Nvidia CUDA (onnx-cuda) as execution provider (EP) for now.
In this directory, run the following command to build the WebAssembly component:
# build component for target wasm32-wasip1
cargo component build
# build component for target wasm32-wasip2
cargo component build --target wasm32-wasip2
In the Wasmtime root directory, run the following command to build the Wasmtime CLI and run the WebAssembly component:
cargo build --features component-model,wasi-nn,wasmtime-wasi-nn/onnx-download
cargo build --features component-model,wasi-nn,wasmtime-wasi-nn/onnx-cuda,wasmtime-wasi-nn/onnx-download
The execution target is controlled by passing a single argument to the WASM module.
Arguments:
cpu - Use CPU executiongpu or cuda - Use GPU/CUDA execution./target/debug/wasmtime run \
-Snn \
--dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \
./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip2/debug/classification-component-onnx.wasm
# path to `libonnxruntime_providers_cuda.so` downloaded by `ort-sys`
export LD_LIBRARY_PATH={wasmtime_workspace}/target/debug
./target/debug/wasmtime run \
-Snn \
--dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \
./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip2/debug/classification-component-onnx.wasm \
gpu
You should get output similar to:
No execution target specified, defaulting to CPU
Read ONNX model, size in bytes: 4956208
Loaded graph into wasi-nn with Cpu target
Created wasi-nn execution context.
Read ONNX Labels, # of labels: 1000
Executed graph inference
Retrieved output data with length: 4000
Index: n02099601 golden retriever - Probability: 0.9948673
Index: n02088094 Afghan hound, Afghan - Probability: 0.002528982
Index: n02102318 cocker spaniel, English cocker spaniel, cocker - Probability: 0.0010986356
When using GPU target, the first line will indicate the selected execution target.
You can monitor GPU usage using cmd watch -n 1 nvidia-smi.
To see trace logs from wasmtime_wasi_nn or ort, run Wasmtime with WASMTIME_LOG enabled, e.g.,
WASMTIME_LOG=wasmtime_wasi_nn=warn ./target/debug/wasmtime run ...
WASMTIME_LOG=ort=warn ./target/debug/wasmtime run ...
wasmtime-wasi-nn/onnx-cuda featureIf the GPU execution provider is requested (by passing gpu) but the device does not have a GPU or the necessary CUDA drivers are missing, ONNX Runtime will silently fall back to the CPU execution provider. The application will continue to run, but inference will happen on the CPU.
To verify if fallback is happening, you can enable ONNX Runtime logging:
Build Wasmtime with the additional wasmtime-wasi-nn/ort-tracing feature:
cargo build --features component-model,wasi-nn,wasmtime-wasi-nn/onnx-cuda,wasmtime-wasi-nn/ort-tracing
Run Wasmtime with WASMTIME_LOG enabled to see ort warnings:
WASMTIME_LOG=ort=warn ./target/debug/wasmtime run ...
You should see a warning like: No execution providers from session options registered successfully; may fall back to CPU.