sdk/runanywhere-web/packages/core/README.md
On-device AI for the browser. Run LLMs, Speech-to-Text, Text-to-Speech, Vision, and Voice AI locally via WebAssembly -- private, offline-capable, zero server dependencies.
<p align="center"> <a href="#"></a> <a href="#"></a> <a href="#"></a> <a href="#"></a> <a href="LICENSE"></a> </p>Beta (v0.1.0) -- This is an early release for testing and feedback. The API surface is stable but may change before v1.0. Not yet recommended for production deployments without thorough testing.
SDKLoggerEventBus for model lifecycle and SDK events| Component | Minimum | Recommended |
|---|---|---|
| Browser | Chrome 96+ / Edge 96+ | Chrome 120+ / Edge 120+ |
| WebAssembly | Required | Required |
| SharedArrayBuffer | For multi-threaded WASM | Requires Cross-Origin Isolation headers |
| WebGPU | For GPU-accelerated diffusion | Chrome 120+ |
| OPFS | For persistent model storage | All modern browsers |
| RAM | 2GB | 4GB+ for larger models |
| Storage | Variable | Models: 40MB -- 4GB depending on model |
The Web SDK is split into three npm packages so you only ship the backends you need:
| Package | Description | Includes |
|---|---|---|
@runanywhere/web | Core SDK — lifecycle, logging, events, model management, storage | TypeScript only (no WASM) |
@runanywhere/web-llamacpp | LLM, VLM, tool calling, structured output, embeddings, diffusion | llama.cpp WASM (~3.7 MB CPU, ~3.9 MB WebGPU) |
@runanywhere/web-onnx | STT, TTS, VAD, audio capture/playback | sherpa-onnx WASM (~12 MB, lazy-loaded) |
Install only what you need — @runanywhere/web is always required as the core.
# Core + all backends
npm install @runanywhere/web @runanywhere/web-llamacpp @runanywhere/web-onnx
# LLM/VLM only (no speech)
npm install @runanywhere/web @runanywhere/web-llamacpp
# Speech only (no LLM)
npm install @runanywhere/web @runanywhere/web-onnx
WASM files are included in @runanywhere/web-llamacpp and @runanywhere/web-onnx. Configure your bundler to serve them as static assets.
Important: Your server must set Cross-Origin Isolation headers for
SharedArrayBufferand multi-threaded WASM to work. Without these headers the SDK falls back to single-threaded mode, which is significantly slower. See Cross-Origin Isolation Headers for all platforms (Nginx, Vercel, Netlify, Cloudflare, AWS, Apache).
Vite:
// vite.config.ts
export default defineConfig({
assetsInclude: ['**/*.wasm'],
server: {
headers: {
'Cross-Origin-Opener-Policy': 'same-origin',
'Cross-Origin-Embedder-Policy': 'credentialless',
},
},
worker: { format: 'es' },
optimizeDeps: {
exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
},
});
Warning (Vite users): You must add
@runanywhere/web-llamacppand@runanywhere/web-onnxtooptimizeDeps.exclude. Vite's dependency pre-bundling flattens packages into.vite/deps/, which breaks the relativeimport.meta.urlpaths the SDK uses to locate its WASM files. Without this exclusion, WASM loading will fail with a "Failed to fetch dynamically imported module" error. This is a known Vite limitation with npm packages that resolve static assets viaimport.meta.url.
Webpack:
// webpack.config.js
module.exports = {
module: {
rules: [
{ test: /\.wasm$/, type: 'asset/resource' },
],
},
devServer: {
headers: {
'Cross-Origin-Opener-Policy': 'same-origin',
'Cross-Origin-Embedder-Policy': 'credentialless',
},
},
};
Safari/iOS: Safari does not support
credentiallessCOEP. Use the COI service worker pattern shown in the demo app — it intercepts responses and injectsrequire-corpheaders at runtime.
import { RunAnywhere } from '@runanywhere/web';
import { LlamaCPP, TextGeneration } from '@runanywhere/web-llamacpp';
import { ONNX, STT, STTModelType, TTS, VAD, SpeechActivity } from '@runanywhere/web-onnx';
await RunAnywhere.initialize({ environment: 'development', debug: true });
// Register backends
await LlamaCPP.register();
await ONNX.register();
import { TextGeneration } from '@runanywhere/web-llamacpp';
// Load a GGUF model
await TextGeneration.loadModel('/models/qwen2.5-0.5b-instruct-q4_0.gguf', 'qwen2.5-0.5b');
// Generate
const result = await TextGeneration.generate('Explain quantum computing briefly.');
console.log(result.text);
// Stream tokens
for await (const token of TextGeneration.generateStream('Write a haiku about code.')) {
process.stdout.write(token);
}
import { STT, STTModelType } from '@runanywhere/web-onnx';
await STT.loadModel({
modelId: 'whisper-tiny',
type: STTModelType.Whisper,
modelFiles: { encoder: '/models/encoder.onnx', decoder: '/models/decoder.onnx', tokens: '/models/tokens.txt' },
sampleRate: 16000,
});
const result = await STT.transcribe(audioFloat32Array);
console.log(result.text);
import { TTS } from '@runanywhere/web-onnx';
await TTS.loadVoice({
voiceId: 'piper-en',
modelPath: '/models/piper-en.onnx',
tokensPath: '/models/tokens.txt',
dataDir: '/models/espeak-ng-data',
});
const result = await TTS.synthesize('Hello from RunAnywhere!');
// result.audioData is Float32Array, result.sampleRate is the sample rate
import { VAD, SpeechActivity } from '@runanywhere/web-onnx';
await VAD.initialize({ modelPath: '/models/silero_vad.onnx' });
VAD.onSpeechActivity((activity) => {
if (activity === SpeechActivity.Ended) {
const segment = VAD.popSpeechSegment();
if (segment) console.log(`Speech: ${segment.samples.length} samples`);
}
});
// Feed audio chunks from microphone
VAD.processSamples(audioChunk);
import { VLM, VLMImageFormat } from '@runanywhere/web-llamacpp';
await VLM.loadModel('/models/qwen2-vl.gguf', '/models/mmproj.gguf', 'qwen2-vl');
const result = await VLM.process(
{ format: VLMImageFormat.RGB, rgbPixels: pixelData, width: 256, height: 256 },
'Describe this image.',
{ maxTokens: 100 },
);
console.log(result.text);
+---------------------------------------------+
| TypeScript API |
| RunAnywhere / TextGeneration / STT / TTS |
| VAD / VLM / VoicePipeline / Embeddings |
+---------------------------------------------+
| WASMBridge + PlatformAdapter |
| (Emscripten addFunction / ccall / cwrap) |
+---------------------------------------------+
| RACommons C++ (compiled to WASM) |
| - Service Registry - Event System |
| - Model Management - Lifecycle |
+---------------------------------------------+
| Inference Backends (WASM) |
| - llama.cpp (LLM / VLM) |
| - whisper.cpp (STT) |
| - sherpa-onnx (TTS / VAD) |
+---------------------------------------------+
The Web SDK compiles the same C++ core (runanywhere-commons) used by the iOS and Android SDKs to WebAssembly via Emscripten. The inference engines (llama.cpp, whisper.cpp, sherpa-onnx) are the same native code running in the browser, with identical vtable dispatch, service registry, and event system.
| Layer | Component | Description |
|---|---|---|
| Public | RunAnywhere | SDK lifecycle (initialize, shutdown, device capabilities) |
| Public | TextGeneration | LLM text generation and streaming |
| Public | STT | Speech-to-text transcription |
| Public | TTS | Text-to-speech synthesis |
| Public | VAD | Voice activity detection |
| Public | VLM | Vision-language model inference |
| Public | VoicePipeline | STT -> LLM -> TTS orchestration |
| Public | ToolCalling | Function calling with typed definitions |
| Public | StructuredOutput | JSON schema-guided generation |
| Public | Embeddings | Vector embedding generation |
| Foundation | WASMBridge | Emscripten module loader and C interop |
| Foundation | SDKLogger | Structured logging with configurable levels |
| Foundation | EventBus | Typed event system for SDK lifecycle events |
| Foundation | SDKError | Typed error hierarchy with error codes |
| Infrastructure | ModelManager | Model download, storage, and loading orchestration |
| Infrastructure | OPFSStorage | Persistent storage via Origin Private File System |
| Infrastructure | AudioCapture | Microphone capture with Web Audio API |
| Infrastructure | VideoCapture | Camera capture and frame extraction |
| Infrastructure | AudioPlayback | Audio playback via Web Audio API |
| Infrastructure | VLMWorkerBridge | Web Worker bridge for off-main-thread VLM inference |
sdk/runanywhere-web/
+-- packages/
| +-- core/ # @runanywhere/web npm package
| +-- src/
| | +-- Public/ # Public API
| | | +-- RunAnywhere.ts
| | | +-- Extensions/
| | | +-- RunAnywhere+TextGeneration.ts
| | | +-- RunAnywhere+STT.ts
| | | +-- RunAnywhere+TTS.ts
| | | +-- RunAnywhere+VAD.ts
| | | +-- RunAnywhere+VLM.ts
| | | +-- RunAnywhere+VoiceAgent.ts
| | | +-- RunAnywhere+VoicePipeline.ts
| | | +-- RunAnywhere+ToolCalling.ts
| | | +-- RunAnywhere+StructuredOutput.ts
| | | +-- RunAnywhere+Embeddings.ts
| | | +-- RunAnywhere+Diffusion.ts
| | | +-- RunAnywhere+ModelManagement.ts
| | +-- Foundation/ # Core infrastructure
| | | +-- WASMBridge.ts
| | | +-- PlatformAdapter.ts
| | | +-- EventBus.ts
| | | +-- SDKLogger.ts
| | | +-- ErrorTypes.ts
| | | +-- SherpaONNXBridge.ts
| | +-- Infrastructure/ # Browser services
| | | +-- ModelManager.ts
| | | +-- ModelDownloader.ts
| | | +-- ModelRegistry.ts
| | | +-- OPFSStorage.ts
| | | +-- AudioCapture.ts
| | | +-- AudioPlayback.ts
| | | +-- VideoCapture.ts
| | | +-- VLMWorkerBridge.ts
| | | +-- DeviceCapabilities.ts
| | | +-- ArchiveUtility.ts
| | +-- types/ # Shared type definitions
| +-- wasm/ # WASM build output (generated)
| +-- dist/ # TypeScript build output (generated)
+-- wasm/ # Emscripten build system
| +-- CMakeLists.txt
| +-- src/wasm_exports.cpp
| +-- platform/wasm_platform_shims.cpp
| +-- scripts/
| +-- build.sh # Main WASM build script
| +-- setup-emsdk.sh # Emscripten SDK installer
| +-- build-sherpa-onnx.sh # Sherpa-ONNX WASM build
+-- package.json # Workspace root
+-- tsconfig.base.json
Building from source is only required if you want to modify the C++ core or build a custom WASM binary with specific backends. Pre-built WASM files are included in the npm package.
# One-time setup
./wasm/scripts/setup-emsdk.sh
source ~/emsdk/emsdk_env.sh
# All backends (LLM + STT + TTS/VAD) -- produces racommons.wasm (~3.6 MB)
./wasm/scripts/build.sh --all-backends
# Individual backends
./wasm/scripts/build.sh --llamacpp # LLM only (llama.cpp)
./wasm/scripts/build.sh --whispercpp # STT only (whisper.cpp)
./wasm/scripts/build.sh --onnx # TTS/VAD only (sherpa-onnx)
./wasm/scripts/build.sh --llamacpp --vlm # LLM + VLM (llama.cpp + mtmd)
# WebGPU-accelerated build
./wasm/scripts/build.sh --webgpu
# Debug build with pthreads
./wasm/scripts/build.sh --debug --pthreads --all-backends
# Clean rebuild
./wasm/scripts/build.sh --clean --all-backends
Build outputs are copied to packages/core/wasm/.
cd sdk/runanywhere-web
npm install
npm run build:ts
Output: packages/core/dist/index.js and packages/core/dist/index.d.ts.
cd packages/core && npx tsc --noEmit
| Feature | Required | Fallback |
|---|---|---|
| WebAssembly | Yes | N/A |
| SharedArrayBuffer | For pthreads (multi-threaded) | Single-threaded mode |
| Cross-Origin Isolation | For SharedArrayBuffer | Single-threaded mode |
| WebGPU | For Diffusion backend | N/A (Diffusion unavailable) |
| OPFS | For persistent model storage | MEMFS (volatile, models re-downloaded each session) |
| Web Audio API | For microphone capture / playback | N/A |
Use detectCapabilities() to check browser support at runtime:
import { detectCapabilities } from '@runanywhere/web';
const caps = await detectCapabilities();
console.log('Cross-Origin Isolated:', caps.isCrossOriginIsolated);
console.log('SharedArrayBuffer:', caps.hasSharedArrayBuffer);
console.log('WebGPU:', caps.hasWebGPU);
console.log('OPFS:', caps.hasOPFS);
For multi-threaded WASM (pthreads), your server must set two HTTP headers on every response:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
These headers enable SharedArrayBuffer, which is required for multi-threaded WASM. Without them, crossOriginIsolated will be false and the SDK falls back to single-threaded mode.
Note: require-corp means all sub-resources (images, scripts, fonts, iframes) must either be same-origin or include a Cross-Origin-Resource-Policy: cross-origin header. Plan accordingly for CDN assets.
server {
listen 443 ssl;
server_name app.example.com;
add_header Cross-Origin-Opener-Policy "same-origin" always;
add_header Cross-Origin-Embedder-Policy "require-corp" always;
types {
application/wasm wasm;
}
location ~* \.wasm$ {
add_header Cross-Origin-Opener-Policy "same-origin" always;
add_header Cross-Origin-Embedder-Policy "require-corp" always;
add_header Cache-Control "public, max-age=31536000, immutable";
}
}
{
"headers": [
{
"source": "/(.*)",
"headers": [
{ "key": "Cross-Origin-Opener-Policy", "value": "same-origin" },
{ "key": "Cross-Origin-Embedder-Policy", "value": "require-corp" }
]
}
]
}
[[headers]]
for = "/*"
[headers.values]
Cross-Origin-Opener-Policy = "same-origin"
Cross-Origin-Embedder-Policy = "require-corp"
Create a _headers file in the project root:
/*
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
Add a Response Headers Policy with:
Cross-Origin-Opener-Policy: same-originCross-Origin-Embedder-Policy: require-corpOr use a CloudFront Function:
function handler(event) {
var response = event.response;
var headers = response.headers;
headers['cross-origin-opener-policy'] = { value: 'same-origin' };
headers['cross-origin-embedder-policy'] = { value: 'require-corp' };
return response;
}
<IfModule mod_headers.c>
Header always set Cross-Origin-Opener-Policy "same-origin"
Header always set Cross-Origin-Embedder-Policy "require-corp"
</IfModule>
AddType application/wasm .wasm
export default defineConfig({
server: {
headers: {
'Cross-Origin-Opener-Policy': 'same-origin',
'Cross-Origin-Embedder-Policy': 'credentialless',
},
},
});
await RunAnywhere.initialize({
environment: 'development', // 'development' | 'staging' | 'production'
debug: true, // Enable verbose logging
});
The SDK uses SDKLogger for all internal logging. Configure log level and enable/disable:
import { SDKLogger, LogLevel } from '@runanywhere/web';
SDKLogger.level = LogLevel.Debug; // Trace | Debug | Info | Warning | Error | Fatal
SDKLogger.enabled = true; // Toggle all SDK logging
Subscribe to SDK lifecycle events:
import { EventBus } from '@runanywhere/web';
EventBus.shared.on('model.downloadProgress', (event) => {
console.log(`Download: ${(event.data.progress * 100).toFixed(0)}%`);
});
EventBus.shared.on('model.loadCompleted', (event) => {
console.log(`Model loaded: ${event.data.modelId}`);
});
The SDK uses typed errors with error codes:
import { SDKError, SDKErrorCode } from '@runanywhere/web';
try {
await TextGeneration.generate('Hello');
} catch (err) {
if (err instanceof SDKError) {
switch (err.code) {
case SDKErrorCode.NotInitialized:
console.error('SDK not initialized');
break;
case SDKErrorCode.ModelNotLoaded:
console.error('No model loaded');
break;
default:
console.error(`SDK error [${err.code}]: ${err.message}`);
}
}
}
A full-featured example application is included at examples/web/RunAnywhereAI/. It demonstrates all SDK capabilities across seven tabs: Chat, Vision, Voice, Transcribe, Speak, Storage, and Settings.
cd examples/web/RunAnywhereAI
npm install
npm run dev
The demo app runs on Vite with Cross-Origin Isolation headers pre-configured.
@runanywhere/web (core)| Export | Description |
|---|---|
RunAnywhere | SDK lifecycle (initialize, shutdown, capabilities) |
ModelManager | Model download, storage, and loading |
OPFSStorage | Persistent storage via OPFS |
SDKLogger | Structured logging |
SDKError | Typed error hierarchy |
EventBus | SDK event system |
detectCapabilities | Browser feature detection |
@runanywhere/web-llamacpp| Export | Description |
|---|---|
LlamaCPP | Backend registration |
TextGeneration | LLM text generation and streaming |
VLM | Vision-language model inference |
ToolCalling | Function calling with typed definitions |
StructuredOutput | JSON schema-guided generation |
Embeddings | Vector embedding generation |
Diffusion | Image generation (WebGPU) |
VLMWorkerBridge | Web Worker bridge for VLM inference |
VideoCapture | Camera capture and frame extraction |
TelemetryService | Telemetry and analytics |
@runanywhere/web-onnx| Export | Description |
|---|---|
ONNX | Backend registration |
STT | Speech-to-text transcription |
TTS | Text-to-speech synthesis |
VAD | Voice activity detection |
AudioCapture | Microphone capture via Web Audio API |
AudioPlayback | Audio playback via Web Audio API |
AudioFileLoader | Audio file loading and decoding |
Yes. Once models are downloaded and cached in OPFS, the SDK works entirely offline. No server, API key, or network connection is needed for inference.
Models are stored in the browser's Origin Private File System (OPFS), a sandboxed persistent storage API. Files persist across browser sessions but are origin-scoped and not accessible via the regular file system. If OPFS quota is exceeded, the SDK falls back to an in-memory cache for the current session.
The core racommons.wasm is approximately 3.6 MB (all backends). The sherpa-onnx WASM (for TTS/VAD) is approximately 12 MB and is loaded separately only when needed. These are downloaded once and cached by the browser.
Yes. All inference runs entirely in the browser via WebAssembly. No data is sent to any server. Audio, text, and images never leave the device.
Chrome 96+ and Edge 96+ are fully supported. Firefox 119+ works but lacks WebGPU. Safari 17+ has basic support but limited OPFS reliability. Mobile browsers have memory constraints that limit larger models.
Yes. Any GGUF-format model compatible with llama.cpp works for LLM/VLM. STT models use ONNX format via whisper.cpp or sherpa-onnx. TTS models use Piper ONNX format.
Cause: Vite pre-bundles npm dependencies into .vite/deps/, which breaks the relative import.meta.url paths used by @runanywhere/web-llamacpp and @runanywhere/web-onnx to locate their WASM files.
Fix: Add both packages to optimizeDeps.exclude in your vite.config.ts:
optimizeDeps: {
exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
},
Cause: Missing Cross-Origin Isolation headers.
Fix: Add the required headers to your server configuration. See Cross-Origin Isolation Headers. The SDK will fall back to single-threaded mode if headers are missing.
Cause: CORS error, wrong file path, or corrupted download.
Fix: Ensure the model URL has proper CORS headers or serve from the same origin. Check the browser console for network errors. Try deleting the model from OPFS storage and re-downloading.
Cause: Model too large for available browser memory.
Fix: Use smaller quantized models (Q4_0 instead of Q8_0). Close other browser tabs. On mobile, models larger than 1 GB may exceed available memory.
Cause: CLIP image encoding is computationally expensive in WASM.
Fix: Use smaller capture dimensions (256x256 is recommended). The VLM runs in a dedicated Web Worker so the UI remains responsive during inference.
Cause: Browser may evict storage under memory pressure, or Incognito mode.
Fix: The SDK requests persistent storage automatically. Ensure you are not in Incognito/Private mode. Safari has known OPFS reliability issues.
finally blocks (low probability, planned fix)See the repository Contributing Guide for details.
# Clone and set up
git clone https://github.com/RunanywhereAI/runanywhere-sdks.git
cd runanywhere-sdks/sdk/runanywhere-web
# Install dependencies
npm install
# Build TypeScript
npm run build:ts
# Run the demo app
cd ../../examples/web/RunAnywhereAI
npm install
npm run dev
Apache 2.0 -- see LICENSE for details.