Back to Runanywhere Sdks

@runanywhere/web-onnx

sdk/runanywhere-web/packages/onnx/README.md

0.19.134.0 KB
Original Source

@runanywhere/web-onnx

Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) backend for the RunAnywhere Web SDK — powered by sherpa-onnx compiled to WebAssembly.

Note: This package uses sherpa-onnx (not generic ONNX Runtime). Sherpa-onnx is a speech-focused inference engine that runs ONNX models optimized for STT (Whisper, Zipformer, Paraformer), TTS (Piper), and VAD (Silero).

Peer dependency: Requires @runanywhere/web >=0.1.0-beta.0

Installation

bash
npm install @runanywhere/web @runanywhere/web-onnx

Quick Start

typescript
import { RunAnywhere } from '@runanywhere/web';
import { ONNX, STT, STTModelType, TTS, VAD, SpeechActivity } from '@runanywhere/web-onnx';

// 1. Initialize core SDK
await RunAnywhere.initialize({ environment: 'development' });

// 2. Register the ONNX backend
await ONNX.register();

// 3. Speech-to-Text
await STT.loadModel({
  modelId: 'whisper-tiny',
  type: STTModelType.Whisper,
  modelFiles: { encoder: '/models/encoder.onnx', decoder: '/models/decoder.onnx', tokens: '/models/tokens.txt' },
  sampleRate: 16000,
});
const result = await STT.transcribe(audioFloat32Array);
console.log(result.text);

// 4. Text-to-Speech
await TTS.loadVoice({
  voiceId: 'piper-en',
  modelPath: '/models/piper-en.onnx',
  tokensPath: '/models/tokens.txt',
  dataDir: '/models/espeak-ng-data',
});
const speech = await TTS.synthesize('Hello from RunAnywhere!');
// speech.audioData is Float32Array, speech.sampleRate is the sample rate

// 5. Voice Activity Detection
await VAD.initialize({ modelPath: '/models/silero_vad.onnx' });
VAD.onSpeechActivity((activity) => {
  if (activity === SpeechActivity.Ended) {
    const segment = VAD.popSpeechSegment();
    if (segment) console.log(`Speech: ${segment.samples.length} samples`);
  }
});

Capabilities

FeatureClassDescription
Speech-to-TextSTTOffline recognition via Whisper, Zipformer, and Paraformer architectures
Text-to-SpeechTTSNeural voice synthesis via Piper TTS with multiple voice models
Voice Activity DetectionVADReal-time speech/silence detection with Silero VAD
Audio CaptureAudioCaptureMicrophone input via Web Audio API
Audio PlaybackAudioPlaybackAudio output via Web Audio API
Audio File LoaderAudioFileLoaderLoad and decode audio files for transcription

WASM Files

This package includes pre-built sherpa-onnx WASM binaries:

FileDescription
wasm/sherpa/sherpa-onnx.wasmSherpa-ONNX runtime (~12 MB)
wasm/sherpa/sherpa-onnx-asr.jsASR (speech recognition) helper
wasm/sherpa/sherpa-onnx-tts.jsTTS (synthesis) helper
wasm/sherpa/sherpa-onnx-vad.jsVAD (voice activity) helper
wasm/sherpa/sherpa-onnx-glue.jsEmscripten glue code

The sherpa-onnx WASM is loaded lazily — only when STT, TTS, or VAD is first used.

Configure your bundler to serve these as static assets — see the main SDK README for Vite/Webpack examples.

Warning (Vite): You must add @runanywhere/web-onnx to optimizeDeps.exclude in your vite.config.ts. Vite's pre-bundling breaks the import.meta.url paths the SDK uses to locate WASM files. See the main SDK README for the full Vite config.

Cross-Origin Isolation

Multi-threaded WASM requires SharedArrayBuffer, which needs Cross-Origin Isolation headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: credentialless

See the main SDK docs for platform-specific configuration.

License

Apache 2.0