sdk/runanywhere-flutter/packages/runanywhere_onnx/README.md
ONNX Runtime backend for the RunAnywhere Flutter SDK. Provides on-device Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) capabilities.
| Feature | Description |
|---|---|
| Speech-to-Text (STT) | Transcribe audio using Whisper models |
| Text-to-Speech (TTS) | Neural voice synthesis with Piper models |
| Voice Activity Detection | Real-time speech detection with Silero VAD |
| Streaming Support | Real-time transcription and synthesis |
| Privacy-First | All processing happens locally on device |
| Multi-Language | Support for 100+ languages (Whisper) |
Add both the core SDK and this backend to your pubspec.yaml:
dependencies:
runanywhere: ^0.15.11
runanywhere_onnx: ^0.15.11
Then run:
flutter pub get
Note: This package requires the core
runanywherepackage. It won't work standalone.
| Platform | Minimum Version | Requirements |
|---|---|---|
| iOS | 14.0+ | Microphone permission |
| Android | API 24+ | RECORD_AUDIO permission |
Update ios/Podfile:
platform :ios, '14.0'
target 'Runner' do
use_frameworks! :linkage => :static # Required!
flutter_install_all_ios_pods File.dirname(File.realpath(__FILE__))
end
Add to ios/Runner/Info.plist:
<key>NSMicrophoneUsageDescription</key>
<string>Microphone access is needed for speech recognition</string>
Add to android/app/src/main/AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
import 'package:runanywhere/runanywhere.dart';
import 'package:runanywhere_onnx/runanywhere_onnx.dart';
void main() async {
WidgetsFlutterBinding.ensureInitialized();
// Initialize SDK
await RunAnywhere.initialize();
// Register ONNX backend
await Onnx.register();
runApp(MyApp());
}
// STT Model (Whisper)
Onnx.addModel(
id: 'whisper-tiny-en',
name: 'Whisper Tiny English',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
modality: ModelCategory.speechRecognition,
memoryRequirement: 75000000, // ~75MB
);
// TTS Model (Piper)
Onnx.addModel(
id: 'piper-amy-medium',
name: 'Piper Amy (English)',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-amy-medium.tar.gz',
modality: ModelCategory.speechSynthesis,
memoryRequirement: 50000000, // ~50MB
);
// Download and load STT model
await for (final p in RunAnywhere.downloadModel('whisper-tiny-en')) {
if (p.state.isCompleted) break;
}
await RunAnywhere.loadSTTModel('whisper-tiny-en');
// Transcribe audio (PCM16 @ 16kHz mono)
final text = await RunAnywhere.transcribe(audioData);
print('Transcription: $text');
// With detailed result
final result = await RunAnywhere.transcribeWithResult(audioData);
print('Text: ${result.text}');
print('Confidence: ${result.confidence}');
print('Language: ${result.language}');
// Download and load TTS model
await for (final p in RunAnywhere.downloadModel('piper-amy-medium')) {
if (p.state.isCompleted) break;
}
await RunAnywhere.loadTTSVoice('piper-amy-medium');
// Synthesize speech
final result = await RunAnywhere.synthesize(
'Hello! Welcome to RunAnywhere.',
rate: 1.0, // Speech rate
pitch: 1.0, // Speech pitch
);
print('Duration: ${result.durationSeconds}s');
print('Sample rate: ${result.sampleRate} Hz');
print('Samples: ${result.samples.length}');
// Play with audioplayers package
// await audioPlayer.play(BytesSource(wavBytes));
register()Register the ONNX backend with the SDK.
static Future<void> register({int priority = 100})
Parameters:
priority – Backend priority (higher = preferred). Default: 100.addModel()Add an ONNX model to the registry.
static void addModel({
required String id,
required String name,
required String url,
required ModelCategory modality,
int memoryRequirement = 0,
})
Parameters:
id – Unique model identifiername – Human-readable model nameurl – Download URL (supports .tar.gz, .tar.bz2, .zip)modality – Model category (speechRecognition, speechSynthesis)memoryRequirement – Estimated memory usage in bytes| Model | Size | Memory | Languages | Speed |
|---|---|---|---|---|
| whisper-tiny.en | ~40MB | ~75MB | English only | Fastest |
| whisper-tiny | ~75MB | ~150MB | Multilingual | Fast |
| whisper-base.en | ~75MB | ~150MB | English only | Fast |
| whisper-base | ~150MB | ~300MB | Multilingual | Medium |
| whisper-small.en | ~250MB | ~500MB | English only | Slower |
Recommendation: Use
whisper-tiny.enfor English-only apps. Usewhisper-tinyfor multilingual support.
| Voice | Language | Size | Quality |
|---|---|---|---|
| amy-medium | English (US) | ~50MB | Medium |
| amy-low | English (US) | ~25MB | Lower |
| lessac-medium | English (US) | ~50MB | Medium |
| Various | 30+ languages | Varies | Medium |
Recommendation: Use
amy-mediumfor good quality English TTS.
For full voice assistant functionality, combine STT + LLM + TTS:
import 'package:runanywhere/runanywhere.dart';
import 'package:runanywhere_onnx/runanywhere_onnx.dart';
import 'package:runanywhere_llamacpp/runanywhere_llamacpp.dart';
// Initialize all backends
await RunAnywhere.initialize();
await Onnx.register();
await LlamaCpp.register();
// Load all models
await RunAnywhere.loadSTTModel('whisper-tiny-en');
await RunAnywhere.loadModel('smollm2-360m');
await RunAnywhere.loadTTSVoice('piper-amy-medium');
// Check voice agent readiness
print('Voice agent ready: ${RunAnywhere.isVoiceAgentReady}');
// Start voice session
if (RunAnywhere.isVoiceAgentReady) {
final session = await RunAnywhere.startVoiceSession();
session.events.listen((event) {
if (event is VoiceSessionTranscribed) {
print('User: ${event.text}');
} else if (event is VoiceSessionResponded) {
print('AI: ${event.text}');
}
});
}
| Property | Requirement |
|---|---|
| Format | PCM16 (signed 16-bit) |
| Sample Rate | 16000 Hz |
| Channels | Mono (1 channel) |
| Encoding | Little-endian |
| Property | Value |
|---|---|
| Format | Float32 PCM |
| Sample Rate | 22050 Hz (Piper default) |
| Channels | Mono (1 channel) |
Possible Causes:
Solutions:
Solutions:
*-medium quality models instead of *-lowSolutions:
iOS:
NSMicrophoneUsageDescription to Info.plistAndroid:
RECORD_AUDIO permission to AndroidManifest.xmlpermission_handler package to request at runtime// Unload STT model to free memory
await RunAnywhere.unloadSTTModel();
// Unload TTS voice
await RunAnywhere.unloadTTSVoice();
// Check current loaded models
print('STT loaded: ${RunAnywhere.isSTTModelLoaded}');
print('TTS loaded: ${RunAnywhere.isTTSVoiceLoaded}');
This software is licensed under the RunAnywhere License, which is based on Apache 2.0 with additional terms for commercial use. See LICENSE for details.
For commercial licensing inquiries, contact: [email protected]