rust-api-examples/README.md
This folder contains examples that use the sherpa-onnx Rust crate maintained in
this repository.
For most users, you don't need to configure Rust linking details manually.
Just enter this directory and run one of the helper scripts below. Each script downloads the required model files automatically if needed.
For example:
./run-version.sh
You can also run examples directly with Cargo:
cargo run --example version
The default Rust setup uses static linking.
The first build may download the matching sherpa-onnx native libraries for your platform automatically. This process is usually automatic and mostly invisible to the user.
If you want shared libraries instead of the default static behavior, use:
cargo run --no-default-features --features shared --example version
If you want to customize which libraries are used, set SHERPA_ONNX_LIB_DIR,
choose shared instead of the default behavior, or configure the crate directly
in your own Cargo project, see
for-advanced-users.md.
| # | Example | Description |
|---|---|---|
| 1 | version | Show the sherpa-onnx version |
| 2 | pocket_tts | Text-to-speech with zero-shot voice cloning using a reference audio |
| 3 | supertonic_tts | Text-to-speech with Supertonic TTS (multi-speaker, multi-language) |
| 4 | zipvoice_tts | Text-to-speech with ZipVoice zero-shot voice cloning |
| 5 | vits_tts | Text-to-speech with a standalone VITS Piper model (English) |
| 6 | vits_tts | Text-to-speech with a standalone VITS Piper model (German) |
| 7 | matcha_tts_en | Text-to-speech with Matcha TTS (English) |
| 8 | matcha_tts_zh | Text-to-speech with Matcha TTS (Chinese) |
| 9 | kokoro_tts_en | Text-to-speech with Kokoro TTS (English) |
| 10 | kokoro_tts_zh_en | Text-to-speech with Kokoro TTS (Chinese + English) |
| 11 | kitten_tts_en | Text-to-speech with Kitten TTS (English) |
| 12 | streaming_zipformer_en | Streaming ASR with zipformer transducer (English) |
| 13 | streaming_zipformer_zh_en | Streaming ASR with zipformer transducer (Chinese + English) |
| 14 | streaming_zipformer_microphone | Real-time streaming ASR from microphone input |
| 15 | zipformer_en | Non-streaming ASR with zipformer transducer (English) |
| 16 | zipformer_zh_en | Non-streaming ASR with zipformer transducer (Chinese + English) |
| 17 | zipformer_vi | Non-streaming ASR with zipformer transducer (Vietnamese) |
| 18 | nemo_parakeet | Non-streaming ASR with Nemo Parakeet TDT transducer (English) |
| 19 | fire_red_asr_ctc | Non-streaming ASR with FireRedASR CTC model (Chinese + English) |
| 20 | moonshine_v2 | Non-streaming ASR with Moonshine v2 (English) |
| 21 | sense_voice | Non-streaming ASR with SenseVoice (Chinese, English, Japanese, Korean, Cantonese) |
| 22 | qwen3_asr | Non-streaming ASR with Qwen3 ASR (multilingual) |
| 23 | cohere_transcribe | Non-streaming ASR with Cohere Transcribe (multilingual) |
| 24 | silero_vad_remove_silence | Remove silences from an audio file using Silero VAD |
| 25 | offline_speech_enhancement_gtcrn | Offline speech enhancement with GTCRN |
| 26 | offline_speech_enhancement_dpdfnet | Offline speech enhancement with DPDFNet |
| 27 | streaming_speech_enhancement_gtcrn | Streaming speech enhancement with GTCRN |
| 28 | streaming_speech_enhancement_dpdfnet | Streaming speech enhancement with DPDFNet |
| 29 | online_punctuation | Add punctuation to text using online punctuation model |
| 30 | keyword_spotter | Detect keywords from audio using a Zipformer KWS model |
| 31 | spoken_language_identification | Detect the spoken language in a wave file using Whisper |
| 32 | offline_punctuation | Add punctuation to text using an offline punctuation model |
| 33 | audio_tagging_zipformer | Audio tagging with a Zipformer model |
| 34 | audio_tagging_ced | Audio tagging with a CED model |
| 35 | speaker_embedding_extractor | Compute a speaker embedding from a wave file |
| 36 | speaker_embedding_manager | Register, search, verify, and remove speakers using embeddings |
| 37 | speaker_embedding_cosine_similarity | Compute cosine similarity from three speaker embeddings |
| 38 | offline_speaker_diarization | Offline speaker diarization with pyannote segmentation and 3D-Speaker embeddings |
| 39 | sense_voice_simulate_streaming_microphone | Simulated streaming ASR with SenseVoice and VAD from microphone |
| 40 | fire_red_asr_ctc_simulate_streaming_microphone | Simulated streaming ASR with FireRedASR CTC and VAD from microphone |
| 41 | parakeet_tdt_ctc_simulate_streaming_microphone | Simulated streaming ASR with Parakeet TDT CTC and VAD from microphone |
| 42 | parakeet_tdt_simulate_streaming_microphone | Simulated streaming ASR with Parakeet TDT transducer and VAD from microphone |
| 43 | wenet_ctc_simulate_streaming_microphone | Simulated streaming ASR with WeNet CTC and VAD from microphone |
| 44 | zipformer_ctc_simulate_streaming_microphone | Simulated streaming ASR with Zipformer CTC and VAD from microphone |
| 45 | zipformer_transducer_simulate_streaming_microphone | Simulated streaming ASR with Zipformer transducer and VAD from microphone |
| 46 | zipformer_transducer_simulate_streaming_microphone | Simulated streaming ASR with Zipformer transducer (Japanese) and VAD from microphone |
| 47 | qwen3_asr_simulate_streaming_microphone | Simulated streaming ASR with Qwen3 ASR and VAD from microphone |
Each helper script downloads the required files if needed.
./run-version.sh
For macOS, you can run
otool -l target/debug/examples/version | grep -A2 LC_RPATH
to check the RPATH for shared builds.
./run-pocket-tts.sh
./run-supertonic-tts.sh
./run-zipvoice-tts.sh
./run-vits-en.sh
./run-vits-de.sh
./run-matcha-tts-en.sh
./run-matcha-tts-zh.sh
./run-kokoro-tts-en.sh
./run-kokoro-tts-zh-en.sh
./run-kitten-tts-en.sh
./run-streaming-zipformer-en.sh
./run-streaming-zipformer-zh-en.sh
./run-streaming-zipformer-microphone-zh-en.sh
./run-zipformer-en.sh
./run-zipformer-zh-en.sh
./run-zipformer-vi.sh
./run-nemo-parakeet-en.sh
./run-fire-red-asr-ctc.sh
./run-moonshine-v2.sh
./run-sense-voice.sh
./run-qwen3-asr.sh
./run-cohere-transcribe.sh
./run-silero-vad-remove-silence.sh
./run-offline-speech-enhancement-gtcrn.sh
./run-offline-speech-enhancement-dpdfnet.sh
./run-streaming-speech-enhancement-gtcrn.sh
./run-streaming-speech-enhancement-dpdfnet.sh
./run-online-punctuation.sh
./run-keyword-spotter.sh
./run-spoken-language-identification.sh
./run-offline-punctuation.sh
./run-audio-tagging-zipformer.sh
./run-audio-tagging-ced.sh
./run-speaker-embedding-extractor.sh
./run-speaker-embedding-manager.sh
./run-speaker-embedding-cosine-similarity.sh
./run-offline-speaker-diarization.sh
This example uses Silero VAD to detect speech segments and runs the offline SenseVoice recognizer on each detected segment, providing an experience similar to streaming ASR.
./run-sense-voice-simulate-streaming-microphone.sh
This example uses Silero VAD to detect speech segments and runs the offline FireRedASR CTC recognizer on each detected segment.
./run-fire-red-asr-ctc-simulate-streaming-microphone.sh
This example uses Silero VAD to detect speech segments and runs the offline Parakeet TDT CTC recognizer on each detected segment (Japanese).
./run-parakeet-tdt-ctc-simulate-streaming-microphone.sh
This example uses Silero VAD to detect speech segments and runs the offline Parakeet TDT transducer recognizer on each detected segment (English).
./run-parakeet-tdt-simulate-streaming-microphone.sh
This example uses Silero VAD to detect speech segments and runs the offline WeNet CTC recognizer on each detected segment (Cantonese).
./run-wenet-ctc-simulate-streaming-microphone.sh
This example uses Silero VAD to detect speech segments and runs the offline Zipformer CTC recognizer on each detected segment (Chinese).
./run-zipformer-ctc-simulate-streaming-microphone.sh
This example uses Silero VAD to detect speech segments and runs the offline Zipformer transducer recognizer on each detected segment (Chinese).
./run-zipformer-transducer-simulate-streaming-microphone.sh
This example uses Silero VAD to detect speech segments and runs the offline Zipformer transducer recognizer on each detected segment (Japanese, reazonspeech model).
./run-zipformer-ja-reazonspeech-simulate-streaming-microphone.sh
This example uses Silero VAD to detect speech segments and runs the offline Qwen3 ASR recognizer on each detected segment.
./run-qwen3-asr-simulate-streaming-microphone.sh