Introduction

This folder contains examples that use the sherpa-onnx Rust crate maintained in this repository.

Setup

For most users, you don't need to configure Rust linking details manually.

Just enter this directory and run one of the helper scripts below. Each script downloads the required model files automatically if needed.

For example:

bash

./run-version.sh

You can also run examples directly with Cargo:

bash

cargo run --example version

The default Rust setup uses static linking.

The first build may download the matching sherpa-onnx native libraries for your platform automatically. This process is usually automatic and mostly invisible to the user.

If you want shared libraries instead of the default static behavior, use:

bash

cargo run --no-default-features --features shared --example version

If you want to customize which libraries are used, set SHERPA_ONNX_LIB_DIR, choose shared instead of the default behavior, or configure the crate directly in your own Cargo project, see for-advanced-users.md.

Examples

#	Example	Description
1	version	Show the sherpa-onnx version
2	pocket_tts	Text-to-speech with zero-shot voice cloning using a reference audio
3	supertonic_tts	Text-to-speech with Supertonic TTS (multi-speaker, multi-language)
4	zipvoice_tts	Text-to-speech with ZipVoice zero-shot voice cloning
5	vits_tts	Text-to-speech with a standalone VITS Piper model (English)
6	vits_tts	Text-to-speech with a standalone VITS Piper model (German)
7	matcha_tts_en	Text-to-speech with Matcha TTS (English)
8	matcha_tts_zh	Text-to-speech with Matcha TTS (Chinese)
9	kokoro_tts_en	Text-to-speech with Kokoro TTS (English)
10	kokoro_tts_zh_en	Text-to-speech with Kokoro TTS (Chinese + English)
11	kitten_tts_en	Text-to-speech with Kitten TTS (English)
12	streaming_zipformer_en	Streaming ASR with zipformer transducer (English)
13	streaming_zipformer_zh_en	Streaming ASR with zipformer transducer (Chinese + English)
14	streaming_zipformer_microphone	Real-time streaming ASR from microphone input
15	zipformer_en	Non-streaming ASR with zipformer transducer (English)
16	zipformer_zh_en	Non-streaming ASR with zipformer transducer (Chinese + English)
17	zipformer_vi	Non-streaming ASR with zipformer transducer (Vietnamese)
18	nemo_parakeet	Non-streaming ASR with Nemo Parakeet TDT transducer (English)
19	fire_red_asr_ctc	Non-streaming ASR with FireRedASR CTC model (Chinese + English)
20	moonshine_v2	Non-streaming ASR with Moonshine v2 (English)
21	sense_voice	Non-streaming ASR with SenseVoice (Chinese, English, Japanese, Korean, Cantonese)
22	qwen3_asr	Non-streaming ASR with Qwen3 ASR (multilingual)
23	cohere_transcribe	Non-streaming ASR with Cohere Transcribe (multilingual)
24	silero_vad_remove_silence	Remove silences from an audio file using Silero VAD
25	offline_speech_enhancement_gtcrn	Offline speech enhancement with GTCRN
26	offline_speech_enhancement_dpdfnet	Offline speech enhancement with DPDFNet
27	streaming_speech_enhancement_gtcrn	Streaming speech enhancement with GTCRN
28	streaming_speech_enhancement_dpdfnet	Streaming speech enhancement with DPDFNet
29	online_punctuation	Add punctuation to text using online punctuation model
30	keyword_spotter	Detect keywords from audio using a Zipformer KWS model
31	spoken_language_identification	Detect the spoken language in a wave file using Whisper
32	offline_punctuation	Add punctuation to text using an offline punctuation model
33	audio_tagging_zipformer	Audio tagging with a Zipformer model
34	audio_tagging_ced	Audio tagging with a CED model
35	speaker_embedding_extractor	Compute a speaker embedding from a wave file
36	speaker_embedding_manager	Register, search, verify, and remove speakers using embeddings
37	speaker_embedding_cosine_similarity	Compute cosine similarity from three speaker embeddings
38	offline_speaker_diarization	Offline speaker diarization with pyannote segmentation and 3D-Speaker embeddings
39	sense_voice_simulate_streaming_microphone	Simulated streaming ASR with SenseVoice and VAD from microphone
40	fire_red_asr_ctc_simulate_streaming_microphone	Simulated streaming ASR with FireRedASR CTC and VAD from microphone
41	parakeet_tdt_ctc_simulate_streaming_microphone	Simulated streaming ASR with Parakeet TDT CTC and VAD from microphone
42	parakeet_tdt_simulate_streaming_microphone	Simulated streaming ASR with Parakeet TDT transducer and VAD from microphone
43	wenet_ctc_simulate_streaming_microphone	Simulated streaming ASR with WeNet CTC and VAD from microphone
44	zipformer_ctc_simulate_streaming_microphone	Simulated streaming ASR with Zipformer CTC and VAD from microphone
45	zipformer_transducer_simulate_streaming_microphone	Simulated streaming ASR with Zipformer transducer and VAD from microphone
46	zipformer_transducer_simulate_streaming_microphone	Simulated streaming ASR with Zipformer transducer (Japanese) and VAD from microphone
47	qwen3_asr_simulate_streaming_microphone	Simulated streaming ASR with Qwen3 ASR and VAD from microphone

Run it

Each helper script downloads the required files if needed.

Example 1: Show sherpa-onnx version

bash

./run-version.sh

For macOS, you can run

otool -l target/debug/examples/version | grep -A2 LC_RPATH

to check the RPATH for shared builds.

Example 2: TTS with Pocket TTS (zero-shot voice cloning)

bash

./run-pocket-tts.sh

Example 3: TTS with Supertonic TTS

bash

./run-supertonic-tts.sh

Example 4: TTS with ZipVoice zero-shot voice cloning

bash

./run-zipvoice-tts.sh

Example 5: TTS with VITS (English Piper)

bash

./run-vits-en.sh

Example 6: TTS with VITS (German Piper)

bash

./run-vits-de.sh

Example 7: TTS with Matcha (English)

bash

./run-matcha-tts-en.sh

Example 8: TTS with Matcha (Chinese)

bash

./run-matcha-tts-zh.sh

Example 9: TTS with Kokoro (English)

bash

./run-kokoro-tts-en.sh

Example 10: TTS with Kokoro (Chinese + English)

bash

./run-kokoro-tts-zh-en.sh

Example 11: TTS with Kitten (English)

bash

./run-kitten-tts-en.sh

Example 12: ASR with streaming zipformer (English)

bash

./run-streaming-zipformer-en.sh

Example 13: ASR with streaming zipformer (Chinese + English)

bash

./run-streaming-zipformer-zh-en.sh

Example 14: ASR with streaming zipformer (with a microphone, real-time ASR)

bash

./run-streaming-zipformer-microphone-zh-en.sh

Example 15: ASR with non-streaming zipformer (English)

bash

./run-zipformer-en.sh

Example 16: ASR with non-streaming zipformer (Chinese + English)

bash

./run-zipformer-zh-en.sh

Example 17: ASR with non-streaming zipformer (Vietnamese)

bash

./run-zipformer-vi.sh

Example 18: ASR with non-streaming Nemo Parakeet (English)

bash

./run-nemo-parakeet-en.sh

Example 19: ASR with non-streaming FireRedASR CTC (Chinese + English)

bash

./run-fire-red-asr-ctc.sh

Example 20: ASR with non-streaming Moonshine v2 (English)

bash

./run-moonshine-v2.sh

Example 21: ASR with non-streaming SenseVoice

bash

./run-sense-voice.sh

Example 22: ASR with non-streaming Qwen3 ASR

bash

./run-qwen3-asr.sh

Example 23: ASR with non-streaming Cohere Transcribe

bash

./run-cohere-transcribe.sh

Example 24: Remove silences from a file using SileroVAD

bash

./run-silero-vad-remove-silence.sh

Example 25: Offline speech enhancement with GTCRN

bash

./run-offline-speech-enhancement-gtcrn.sh

Example 26: Offline speech enhancement with DPDFNet

bash

./run-offline-speech-enhancement-dpdfnet.sh

Example 27: Streaming speech enhancement with GTCRN

bash

./run-streaming-speech-enhancement-gtcrn.sh

Example 28: Streaming speech enhancement with DPDFNet

bash

./run-streaming-speech-enhancement-dpdfnet.sh

Example 29: Online punctuation

bash

./run-online-punctuation.sh

Example 30: Keyword spotter

bash

./run-keyword-spotter.sh

Example 31: Spoken language identification

bash

./run-spoken-language-identification.sh

Example 32: Offline punctuation

bash

./run-offline-punctuation.sh

Example 33: Audio tagging with a Zipformer model

bash

./run-audio-tagging-zipformer.sh

Example 34: Audio tagging with a CED model

bash

./run-audio-tagging-ced.sh

Example 35: Speaker embedding extractor

bash

./run-speaker-embedding-extractor.sh

Example 36: Speaker embedding manager

bash

./run-speaker-embedding-manager.sh

Example 37: Speaker embedding cosine similarity

bash

./run-speaker-embedding-cosine-similarity.sh

Example 38: Offline speaker diarization

bash

./run-offline-speaker-diarization.sh

Example 39: Simulated streaming ASR with SenseVoice and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline SenseVoice recognizer on each detected segment, providing an experience similar to streaming ASR.

bash

./run-sense-voice-simulate-streaming-microphone.sh

Example 40: Simulated streaming ASR with FireRedASR CTC and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline FireRedASR CTC recognizer on each detected segment.

bash

./run-fire-red-asr-ctc-simulate-streaming-microphone.sh

Example 41: Simulated streaming ASR with Parakeet TDT CTC and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Parakeet TDT CTC recognizer on each detected segment (Japanese).

bash

./run-parakeet-tdt-ctc-simulate-streaming-microphone.sh

Example 42: Simulated streaming ASR with Parakeet TDT transducer and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Parakeet TDT transducer recognizer on each detected segment (English).

bash

./run-parakeet-tdt-simulate-streaming-microphone.sh

Example 43: Simulated streaming ASR with WeNet CTC and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline WeNet CTC recognizer on each detected segment (Cantonese).

bash

./run-wenet-ctc-simulate-streaming-microphone.sh

Example 44: Simulated streaming ASR with Zipformer CTC and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Zipformer CTC recognizer on each detected segment (Chinese).

bash

./run-zipformer-ctc-simulate-streaming-microphone.sh

Example 45: Simulated streaming ASR with Zipformer transducer and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Zipformer transducer recognizer on each detected segment (Chinese).

bash

./run-zipformer-transducer-simulate-streaming-microphone.sh

Example 46: Simulated streaming ASR with Zipformer transducer (Japanese) and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Zipformer transducer recognizer on each detected segment (Japanese, reazonspeech model).

bash

./run-zipformer-ja-reazonspeech-simulate-streaming-microphone.sh

Example 47: Simulated streaming ASR with Qwen3 ASR and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Qwen3 ASR recognizer on each detected segment.

bash

./run-qwen3-asr-simulate-streaming-microphone.sh