Back to Sherpa Onnx

Introduction

rust-api-examples/README.md

1.13.014.3 KB
Original Source

Introduction

This folder contains examples that use the sherpa-onnx Rust crate maintained in this repository.

Setup

For most users, you don't need to configure Rust linking details manually.

Just enter this directory and run one of the helper scripts below. Each script downloads the required model files automatically if needed.

For example:

bash
./run-version.sh

You can also run examples directly with Cargo:

bash
cargo run --example version

The default Rust setup uses static linking.

The first build may download the matching sherpa-onnx native libraries for your platform automatically. This process is usually automatic and mostly invisible to the user.

If you want shared libraries instead of the default static behavior, use:

bash
cargo run --no-default-features --features shared --example version

If you want to customize which libraries are used, set SHERPA_ONNX_LIB_DIR, choose shared instead of the default behavior, or configure the crate directly in your own Cargo project, see for-advanced-users.md.

Examples

#ExampleDescription
1versionShow the sherpa-onnx version
2pocket_ttsText-to-speech with zero-shot voice cloning using a reference audio
3supertonic_ttsText-to-speech with Supertonic TTS (multi-speaker, multi-language)
4zipvoice_ttsText-to-speech with ZipVoice zero-shot voice cloning
5vits_ttsText-to-speech with a standalone VITS Piper model (English)
6vits_ttsText-to-speech with a standalone VITS Piper model (German)
7matcha_tts_enText-to-speech with Matcha TTS (English)
8matcha_tts_zhText-to-speech with Matcha TTS (Chinese)
9kokoro_tts_enText-to-speech with Kokoro TTS (English)
10kokoro_tts_zh_enText-to-speech with Kokoro TTS (Chinese + English)
11kitten_tts_enText-to-speech with Kitten TTS (English)
12streaming_zipformer_enStreaming ASR with zipformer transducer (English)
13streaming_zipformer_zh_enStreaming ASR with zipformer transducer (Chinese + English)
14streaming_zipformer_microphoneReal-time streaming ASR from microphone input
15zipformer_enNon-streaming ASR with zipformer transducer (English)
16zipformer_zh_enNon-streaming ASR with zipformer transducer (Chinese + English)
17zipformer_viNon-streaming ASR with zipformer transducer (Vietnamese)
18nemo_parakeetNon-streaming ASR with Nemo Parakeet TDT transducer (English)
19fire_red_asr_ctcNon-streaming ASR with FireRedASR CTC model (Chinese + English)
20moonshine_v2Non-streaming ASR with Moonshine v2 (English)
21sense_voiceNon-streaming ASR with SenseVoice (Chinese, English, Japanese, Korean, Cantonese)
22qwen3_asrNon-streaming ASR with Qwen3 ASR (multilingual)
23cohere_transcribeNon-streaming ASR with Cohere Transcribe (multilingual)
24silero_vad_remove_silenceRemove silences from an audio file using Silero VAD
25offline_speech_enhancement_gtcrnOffline speech enhancement with GTCRN
26offline_speech_enhancement_dpdfnetOffline speech enhancement with DPDFNet
27streaming_speech_enhancement_gtcrnStreaming speech enhancement with GTCRN
28streaming_speech_enhancement_dpdfnetStreaming speech enhancement with DPDFNet
29online_punctuationAdd punctuation to text using online punctuation model
30keyword_spotterDetect keywords from audio using a Zipformer KWS model
31spoken_language_identificationDetect the spoken language in a wave file using Whisper
32offline_punctuationAdd punctuation to text using an offline punctuation model
33audio_tagging_zipformerAudio tagging with a Zipformer model
34audio_tagging_cedAudio tagging with a CED model
35speaker_embedding_extractorCompute a speaker embedding from a wave file
36speaker_embedding_managerRegister, search, verify, and remove speakers using embeddings
37speaker_embedding_cosine_similarityCompute cosine similarity from three speaker embeddings
38offline_speaker_diarizationOffline speaker diarization with pyannote segmentation and 3D-Speaker embeddings
39sense_voice_simulate_streaming_microphoneSimulated streaming ASR with SenseVoice and VAD from microphone
40fire_red_asr_ctc_simulate_streaming_microphoneSimulated streaming ASR with FireRedASR CTC and VAD from microphone
41parakeet_tdt_ctc_simulate_streaming_microphoneSimulated streaming ASR with Parakeet TDT CTC and VAD from microphone
42parakeet_tdt_simulate_streaming_microphoneSimulated streaming ASR with Parakeet TDT transducer and VAD from microphone
43wenet_ctc_simulate_streaming_microphoneSimulated streaming ASR with WeNet CTC and VAD from microphone
44zipformer_ctc_simulate_streaming_microphoneSimulated streaming ASR with Zipformer CTC and VAD from microphone
45zipformer_transducer_simulate_streaming_microphoneSimulated streaming ASR with Zipformer transducer and VAD from microphone
46zipformer_transducer_simulate_streaming_microphoneSimulated streaming ASR with Zipformer transducer (Japanese) and VAD from microphone
47qwen3_asr_simulate_streaming_microphoneSimulated streaming ASR with Qwen3 ASR and VAD from microphone

Run it

Each helper script downloads the required files if needed.

Example 1: Show sherpa-onnx version

bash
./run-version.sh

For macOS, you can run

otool -l target/debug/examples/version | grep -A2 LC_RPATH

to check the RPATH for shared builds.

Example 2: TTS with Pocket TTS (zero-shot voice cloning)

bash
./run-pocket-tts.sh

Example 3: TTS with Supertonic TTS

bash
./run-supertonic-tts.sh

Example 4: TTS with ZipVoice zero-shot voice cloning

bash
./run-zipvoice-tts.sh

Example 5: TTS with VITS (English Piper)

bash
./run-vits-en.sh

Example 6: TTS with VITS (German Piper)

bash
./run-vits-de.sh

Example 7: TTS with Matcha (English)

bash
./run-matcha-tts-en.sh

Example 8: TTS with Matcha (Chinese)

bash
./run-matcha-tts-zh.sh

Example 9: TTS with Kokoro (English)

bash
./run-kokoro-tts-en.sh

Example 10: TTS with Kokoro (Chinese + English)

bash
./run-kokoro-tts-zh-en.sh

Example 11: TTS with Kitten (English)

bash
./run-kitten-tts-en.sh

Example 12: ASR with streaming zipformer (English)

bash
./run-streaming-zipformer-en.sh

Example 13: ASR with streaming zipformer (Chinese + English)

bash
./run-streaming-zipformer-zh-en.sh

Example 14: ASR with streaming zipformer (with a microphone, real-time ASR)

bash
./run-streaming-zipformer-microphone-zh-en.sh

Example 15: ASR with non-streaming zipformer (English)

bash
./run-zipformer-en.sh

Example 16: ASR with non-streaming zipformer (Chinese + English)

bash
./run-zipformer-zh-en.sh

Example 17: ASR with non-streaming zipformer (Vietnamese)

bash
./run-zipformer-vi.sh

Example 18: ASR with non-streaming Nemo Parakeet (English)

bash
./run-nemo-parakeet-en.sh

Example 19: ASR with non-streaming FireRedASR CTC (Chinese + English)

bash
./run-fire-red-asr-ctc.sh

Example 20: ASR with non-streaming Moonshine v2 (English)

bash
./run-moonshine-v2.sh

Example 21: ASR with non-streaming SenseVoice

bash
./run-sense-voice.sh

Example 22: ASR with non-streaming Qwen3 ASR

bash
./run-qwen3-asr.sh

Example 23: ASR with non-streaming Cohere Transcribe

bash
./run-cohere-transcribe.sh

Example 24: Remove silences from a file using SileroVAD

bash
./run-silero-vad-remove-silence.sh

Example 25: Offline speech enhancement with GTCRN

bash
./run-offline-speech-enhancement-gtcrn.sh

Example 26: Offline speech enhancement with DPDFNet

bash
./run-offline-speech-enhancement-dpdfnet.sh

Example 27: Streaming speech enhancement with GTCRN

bash
./run-streaming-speech-enhancement-gtcrn.sh

Example 28: Streaming speech enhancement with DPDFNet

bash
./run-streaming-speech-enhancement-dpdfnet.sh

Example 29: Online punctuation

bash
./run-online-punctuation.sh

Example 30: Keyword spotter

bash
./run-keyword-spotter.sh

Example 31: Spoken language identification

bash
./run-spoken-language-identification.sh

Example 32: Offline punctuation

bash
./run-offline-punctuation.sh

Example 33: Audio tagging with a Zipformer model

bash
./run-audio-tagging-zipformer.sh

Example 34: Audio tagging with a CED model

bash
./run-audio-tagging-ced.sh

Example 35: Speaker embedding extractor

bash
./run-speaker-embedding-extractor.sh

Example 36: Speaker embedding manager

bash
./run-speaker-embedding-manager.sh

Example 37: Speaker embedding cosine similarity

bash
./run-speaker-embedding-cosine-similarity.sh

Example 38: Offline speaker diarization

bash
./run-offline-speaker-diarization.sh

Example 39: Simulated streaming ASR with SenseVoice and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline SenseVoice recognizer on each detected segment, providing an experience similar to streaming ASR.

bash
./run-sense-voice-simulate-streaming-microphone.sh

Example 40: Simulated streaming ASR with FireRedASR CTC and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline FireRedASR CTC recognizer on each detected segment.

bash
./run-fire-red-asr-ctc-simulate-streaming-microphone.sh

Example 41: Simulated streaming ASR with Parakeet TDT CTC and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Parakeet TDT CTC recognizer on each detected segment (Japanese).

bash
./run-parakeet-tdt-ctc-simulate-streaming-microphone.sh

Example 42: Simulated streaming ASR with Parakeet TDT transducer and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Parakeet TDT transducer recognizer on each detected segment (English).

bash
./run-parakeet-tdt-simulate-streaming-microphone.sh

Example 43: Simulated streaming ASR with WeNet CTC and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline WeNet CTC recognizer on each detected segment (Cantonese).

bash
./run-wenet-ctc-simulate-streaming-microphone.sh

Example 44: Simulated streaming ASR with Zipformer CTC and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Zipformer CTC recognizer on each detected segment (Chinese).

bash
./run-zipformer-ctc-simulate-streaming-microphone.sh

Example 45: Simulated streaming ASR with Zipformer transducer and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Zipformer transducer recognizer on each detected segment (Chinese).

bash
./run-zipformer-transducer-simulate-streaming-microphone.sh

Example 46: Simulated streaming ASR with Zipformer transducer (Japanese) and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Zipformer transducer recognizer on each detected segment (Japanese, reazonspeech model).

bash
./run-zipformer-ja-reazonspeech-simulate-streaming-microphone.sh

Example 47: Simulated streaming ASR with Qwen3 ASR and VAD from microphone

This example uses Silero VAD to detect speech segments and runs the offline Qwen3 ASR recognizer on each detected segment.

bash
./run-qwen3-asr-simulate-streaming-microphone.sh