Back to Runanywhere Sdks

RunAnywhere ONNX Backend

sdk/runanywhere-flutter/packages/runanywhere_onnx/README.md

0.19.139.1 KB
Original Source

RunAnywhere ONNX Backend

ONNX Runtime backend for the RunAnywhere Flutter SDK. Provides on-device Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) capabilities.


Features

FeatureDescription
Speech-to-Text (STT)Transcribe audio using Whisper models
Text-to-Speech (TTS)Neural voice synthesis with Piper models
Voice Activity DetectionReal-time speech detection with Silero VAD
Streaming SupportReal-time transcription and synthesis
Privacy-FirstAll processing happens locally on device
Multi-LanguageSupport for 100+ languages (Whisper)

Installation

Add both the core SDK and this backend to your pubspec.yaml:

yaml
dependencies:
  runanywhere: ^0.15.11
  runanywhere_onnx: ^0.15.11

Then run:

bash
flutter pub get

Note: This package requires the core runanywhere package. It won't work standalone.


Platform Support

PlatformMinimum VersionRequirements
iOS14.0+Microphone permission
AndroidAPI 24+RECORD_AUDIO permission

Platform Setup

iOS

Update ios/Podfile:

ruby
platform :ios, '14.0'

target 'Runner' do
  use_frameworks! :linkage => :static  # Required!
  flutter_install_all_ios_pods File.dirname(File.realpath(__FILE__))
end

Add to ios/Runner/Info.plist:

xml
<key>NSMicrophoneUsageDescription</key>
<string>Microphone access is needed for speech recognition</string>

Android

Add to android/app/src/main/AndroidManifest.xml:

xml
<uses-permission android:name="android.permission.RECORD_AUDIO" />

Quick Start

1. Initialize & Register

dart
import 'package:runanywhere/runanywhere.dart';
import 'package:runanywhere_onnx/runanywhere_onnx.dart';

void main() async {
  WidgetsFlutterBinding.ensureInitialized();

  // Initialize SDK
  await RunAnywhere.initialize();

  // Register ONNX backend
  await Onnx.register();

  runApp(MyApp());
}

2. Add Models

dart
// STT Model (Whisper)
Onnx.addModel(
  id: 'whisper-tiny-en',
  name: 'Whisper Tiny English',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
  modality: ModelCategory.speechRecognition,
  memoryRequirement: 75000000,  // ~75MB
);

// TTS Model (Piper)
Onnx.addModel(
  id: 'piper-amy-medium',
  name: 'Piper Amy (English)',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-amy-medium.tar.gz',
  modality: ModelCategory.speechSynthesis,
  memoryRequirement: 50000000,  // ~50MB
);

3. Speech-to-Text

dart
// Download and load STT model
await for (final p in RunAnywhere.downloadModel('whisper-tiny-en')) {
  if (p.state.isCompleted) break;
}
await RunAnywhere.loadSTTModel('whisper-tiny-en');

// Transcribe audio (PCM16 @ 16kHz mono)
final text = await RunAnywhere.transcribe(audioData);
print('Transcription: $text');

// With detailed result
final result = await RunAnywhere.transcribeWithResult(audioData);
print('Text: ${result.text}');
print('Confidence: ${result.confidence}');
print('Language: ${result.language}');

4. Text-to-Speech

dart
// Download and load TTS model
await for (final p in RunAnywhere.downloadModel('piper-amy-medium')) {
  if (p.state.isCompleted) break;
}
await RunAnywhere.loadTTSVoice('piper-amy-medium');

// Synthesize speech
final result = await RunAnywhere.synthesize(
  'Hello! Welcome to RunAnywhere.',
  rate: 1.0,   // Speech rate
  pitch: 1.0,  // Speech pitch
);

print('Duration: ${result.durationSeconds}s');
print('Sample rate: ${result.sampleRate} Hz');
print('Samples: ${result.samples.length}');

// Play with audioplayers package
// await audioPlayer.play(BytesSource(wavBytes));

API Reference

Onnx Class

register()

Register the ONNX backend with the SDK.

dart
static Future<void> register({int priority = 100})

Parameters:

  • priority – Backend priority (higher = preferred). Default: 100.

addModel()

Add an ONNX model to the registry.

dart
static void addModel({
  required String id,
  required String name,
  required String url,
  required ModelCategory modality,
  int memoryRequirement = 0,
})

Parameters:

  • id – Unique model identifier
  • name – Human-readable model name
  • url – Download URL (supports .tar.gz, .tar.bz2, .zip)
  • modality – Model category (speechRecognition, speechSynthesis)
  • memoryRequirement – Estimated memory usage in bytes

Supported Models

Speech-to-Text (Whisper)

ModelSizeMemoryLanguagesSpeed
whisper-tiny.en~40MB~75MBEnglish onlyFastest
whisper-tiny~75MB~150MBMultilingualFast
whisper-base.en~75MB~150MBEnglish onlyFast
whisper-base~150MB~300MBMultilingualMedium
whisper-small.en~250MB~500MBEnglish onlySlower

Recommendation: Use whisper-tiny.en for English-only apps. Use whisper-tiny for multilingual support.

Text-to-Speech (Piper)

VoiceLanguageSizeQuality
amy-mediumEnglish (US)~50MBMedium
amy-lowEnglish (US)~25MBLower
lessac-mediumEnglish (US)~50MBMedium
Various30+ languagesVariesMedium

Recommendation: Use amy-medium for good quality English TTS.


Voice Agent Integration

For full voice assistant functionality, combine STT + LLM + TTS:

dart
import 'package:runanywhere/runanywhere.dart';
import 'package:runanywhere_onnx/runanywhere_onnx.dart';
import 'package:runanywhere_llamacpp/runanywhere_llamacpp.dart';

// Initialize all backends
await RunAnywhere.initialize();
await Onnx.register();
await LlamaCpp.register();

// Load all models
await RunAnywhere.loadSTTModel('whisper-tiny-en');
await RunAnywhere.loadModel('smollm2-360m');
await RunAnywhere.loadTTSVoice('piper-amy-medium');

// Check voice agent readiness
print('Voice agent ready: ${RunAnywhere.isVoiceAgentReady}');

// Start voice session
if (RunAnywhere.isVoiceAgentReady) {
  final session = await RunAnywhere.startVoiceSession();

  session.events.listen((event) {
    if (event is VoiceSessionTranscribed) {
      print('User: ${event.text}');
    } else if (event is VoiceSessionResponded) {
      print('AI: ${event.text}');
    }
  });
}

Audio Format Requirements

STT Input

PropertyRequirement
FormatPCM16 (signed 16-bit)
Sample Rate16000 Hz
ChannelsMono (1 channel)
EncodingLittle-endian

TTS Output

PropertyValue
FormatFloat32 PCM
Sample Rate22050 Hz (Piper default)
ChannelsMono (1 channel)

Troubleshooting

STT Returns Empty Text

Possible Causes:

  1. Audio too short (< 0.5 seconds)
  2. Audio too quiet (no speech detected)
  3. Wrong audio format (not PCM16 @ 16kHz)

Solutions:

  1. Ensure audio is at least 1 second
  2. Check microphone input levels
  3. Verify audio format matches requirements

TTS Sounds Robotic

Solutions:

  1. Use *-medium quality models instead of *-low
  2. Adjust rate/pitch parameters
  3. Try different voice models

Model Loading Fails

Solutions:

  1. Verify model is fully downloaded
  2. Check model format compatibility
  3. Ensure sufficient memory available

Permission Denied

iOS:

  • Add NSMicrophoneUsageDescription to Info.plist
  • Request permission before recording

Android:

  • Add RECORD_AUDIO permission to AndroidManifest.xml
  • Use permission_handler package to request at runtime

Memory Management

dart
// Unload STT model to free memory
await RunAnywhere.unloadSTTModel();

// Unload TTS voice
await RunAnywhere.unloadTTSVoice();

// Check current loaded models
print('STT loaded: ${RunAnywhere.isSTTModelLoaded}');
print('TTS loaded: ${RunAnywhere.isTTSVoiceLoaded}');

Resources


License

This software is licensed under the RunAnywhere License, which is based on Apache 2.0 with additional terms for commercial use. See LICENSE for details.

For commercial licensing inquiries, contact: [email protected]