tauri-examples/non-streaming-speech-recognition-from-microphone/README.md
A Tauri v2 desktop app that captures microphone audio and transcribes it in using offline ASR with Silero VAD.
Pre-built apps (macOS, Linux, Windows) are available at:
.srt fileslibasound2-dev on Debian/Ubuntu)Install npm dependencies:
npm install
This example bundles the SenseVoice int8 model (model type 15), which supports Chinese, English, Japanese, Korean, and Cantonese.
cd tauri-examples/non-streaming-speech-recognition-from-microphone
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.tar.bz2
tar xvf sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.tar.bz2
rm sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.tar.bz2
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
Tauri bundles files from src-tauri/assets/. Place the model directory
(keeping its original name) and silero_vad.onnx inside it:
mkdir -p src-tauri/assets
cp -a sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17 src-tauri/assets/
cp silero_vad.onnx src-tauri/assets/
Expected layout:
src-tauri/assets/
├── silero_vad.onnx
└── sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17/
├── model.int8.onnx
└── tokens.txt
npm run dev
This opens the app window. Click Start Recording, speak into your microphone, and recognized segments appear after you stop speaking. Click Stop Recording to finish — you can then play back the recording and export results.
npm run build
The output is in src-tauri/target/release/bundle/.
The app uses two constants in src-tauri/src/lib.rs to select which model
to bundle:
const MODEL_TYPE: u32 = 15;
const MODEL_NAME: &str = "sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17";
To switch models:
src-tauri/src/model_registry.rs (types 0--61).MODEL_TYPE and MODEL_NAME in src-tauri/src/lib.rs.src-tauri/assets/, keeping the original directory name.npm run dev to test.┌──────────────────────────────────────────────────┐
│ Frontend (HTML + JS) │
│ index.html / main.js / styles.css │
│ │
│ invoke("start_recording") │
│ invoke("stop_recording") │
│ setInterval → invoke("get_recording_state") │
└──────────────┬───────────────────────────────────┘
│ Tauri IPC (invoke)
┌──────────────▼───────────────────────────────────┐
│ Backend (Rust) │
│ │
│ lib.rs │
│ ├── start_recording() → spawns recording thread │
│ ├── run_recording() → cpal → VAD → ASR │
│ ├── stop_recording() → signals thread to stop │
│ ├── get_recording_state() → poll results │
│ ├── save_segment_as_wav() │
│ ├── save_all_audio() │
│ ├── get_recorded_audio_path() │
│ └── export_srt() │
│ │
│ model_registry.rs (auto-generated, 62 models) │
└──────────────────────────────────────────────────┘
Processing pipeline:
| Problem | Solution |
|---|---|
| "Models are still loading" on startup | Models load in a background thread. Wait a few seconds. Large models take longer. |
| "Failed to create recognizer" | Check that src-tauri/assets/<MODEL_NAME>/ exists and contains the expected .onnx and tokens.txt files. |
| "No default input device" | Ensure a microphone is connected and recognized by your OS. |
| No audio captured | On macOS, grant microphone permission when prompted. On Linux, ensure ALSA/PulseAudio is working. |
| App crashes on startup | Run npm run dev and check stderr for [init] or [build_models] log lines. |