sdk/runanywhere-kotlin/docs/ARCHITECTURE.md
This document describes the internal architecture, design principles, and implementation details of the RunAnywhere Kotlin SDK.
Developers call one SDK; we abstract engine complexity. All AI capabilities (LLM, STT, TTS, VAD) are accessed through the unified RunAnywhere object.
The SDK uses Kotlin Multiplatform (KMP) to share code across:
All I/O operations (network, model loading, inference) are non-blocking:
Every operation records metadata (latency, device state, model info):
Aggressive resource management for mobile devices:
Mirrors the iOS RunAnywhere Swift SDK exactly:
┌─────────────────────────────────────────────────────────────────────────┐
│ Application Layer │
│ (Your Android/JVM Application) │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ RunAnywhere Public API │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ RunAnywhere Object │ │
│ │ • initialize()/reset() │ │
│ │ • isInitialized, areServicesReady │ │
│ │ • events (EventBus) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────┬────────────┬────────────┬────────────┬────────────┐ │
│ │ LLM API │ STT API │ TTS API │ VAD API │ VoiceAgent │ │
│ │ (extension)│ (extension)│ (extension)│ (extension)│ (extension)│ │
│ └────────────┴────────────┴────────────┴────────────┴────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Model Management API │ │
│ │ • registerModel(), downloadModel() │ │
│ │ • loadLLMModel(), loadSTTModel(), loadTTSVoice() │ │
│ │ • availableModels(), deleteModel() │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Internal Layer │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ CppBridge │ │
│ │ • JNI bindings to runanywhere-commons │ │
│ │ • Platform adapter registration │ │
│ │ • Callback bridges (events, telemetry) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Platform Services │ │
│ │ • StoragePlatform (file system access) │ │
│ │ • NetworkConnectivity │ │
│ │ • SecureStorage (KeychainManager) │ │
│ │ • DeviceInfo │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Native Layer (C++) │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ runanywhere-commons │ │
│ │ • librac_commons.so - Core infrastructure │ │
│ │ • librunanywhere_jni.so - JNI bridge │ │
│ │ • Model registry, download management │ │
│ │ • Event system, telemetry │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │ │
│ ┌───────────┴───────────┐ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────────────┐ ┌────────────────────────────────────┐ │
│ │ runanywhere-core- │ │ runanywhere-core-onnx │ │
│ │ llamacpp │ │ │ │
│ │ │ │ • libonnxruntime.so │ │
│ │ • llama.cpp engine │ │ • libsherpa-onnx-*.so │ │
│ │ • LLM inference │ │ • STT/TTS/VAD inference │ │
│ └───────────────────────┘ └────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
com.runanywhere.sdk/
├── public/ # Public API (exported)
│ ├── RunAnywhere.kt # Main SDK entry point
│ ├── events/
│ │ ├── EventBus.kt # Event subscription system
│ │ └── SDKEvent.kt # Event type definitions
│ └── extensions/
│ ├── RunAnywhere+TextGeneration.kt # LLM APIs
│ ├── RunAnywhere+STT.kt # Speech-to-text APIs
│ ├── RunAnywhere+TTS.kt # Text-to-speech APIs
│ ├── RunAnywhere+VAD.kt # Voice activity detection
│ ├── RunAnywhere+VoiceAgent.kt # Voice pipeline orchestration
│ ├── RunAnywhere+ModelManagement.kt # Model registration/download
│ ├── LLM/LLMTypes.kt # LLM type definitions
│ ├── STT/STTTypes.kt # STT type definitions
│ ├── TTS/TTSTypes.kt # TTS type definitions
│ ├── VAD/VADTypes.kt # VAD type definitions
│ ├── VoiceAgent/VoiceAgentTypes.kt # Voice agent types
│ └── Models/ModelTypes.kt # Model type definitions
│
├── core/ # Core types and interfaces
│ ├── types/
│ │ └── ComponentTypes.kt # SDKComponent, InferenceFramework
│ └── module/
│ └── SDKModule.kt # Module registration interface
│
├── foundation/ # Foundation utilities
│ ├── SDKLogger.kt # Logging system
│ ├── errors/
│ │ ├── SDKError.kt # Error class
│ │ ├── ErrorCode.kt # Error codes
│ │ └── ErrorCategory.kt # Error categories
│ ├── device/
│ │ └── DeviceCapabilities.kt # Device info
│ └── constants/
│ └── SDKConstants.kt # SDK version, etc.
│
├── native/ # Native bridge (internal)
│ └── bridge/
│ ├── NativeCoreService.kt # JNI service interface
│ ├── BridgeResults.kt # Native call results
│ └── Capability.kt # Native capability types
│
├── data/ # Data layer
│ ├── models/
│ │ └── ModelEntity.kt # Model persistence
│ ├── network/
│ │ └── ApiClient.kt # HTTP client
│ └── repositories/
│ └── ModelRepository.kt # Model data access
│
├── storage/ # Storage layer
│ ├── PlatformStorage.kt # Cross-platform storage
│ └── FileSystem.kt # File operations
│
├── platform/ # Platform abstractions
│ ├── Checksum.kt # Hash verification
│ ├── NetworkConnectivity.kt # Network state
│ └── StoragePlatform.kt # Storage abstraction
│
└── utils/ # Utilities
├── SDKConstants.kt # Constants
└── Extensions.kt # Kotlin extensions
The SDK uses a two-phase initialization pattern for optimal startup performance:
RunAnywhere.initialize(environment)
│
├─► Store environment
│
├─► Set log level based on environment
│
└─► CppBridge.initialize()
│
├─► Load JNI library (librunanywhere_jni.so)
│
├─► Register PlatformAdapter (file I/O, logging, keychain)
│
├─► Register Events callback (analytics)
│
└─► Initialize Device registration
Result: isInitialized = true
RunAnywhere.completeServicesInitialization()
│
├─► CppBridge.initializeServices()
│ │
│ ├─► Register ModelAssignment callbacks
│ │
│ └─► Register Platform service callbacks (LLM/TTS)
│
└─► Mark: areServicesReady = true
Key Points:
Application.onCreate()| Operation | Thread | Notes |
|---|---|---|
RunAnywhere.initialize() | Calling thread (main) | Fast, < 5ms |
completeServicesInitialization() | Calling thread | Suspending function |
loadLLMModel() / loadSTTModel() | Dispatchers.IO | Async, returns immediately |
generate() / transcribe() | Dispatchers.Default | CPU-bound inference |
generateStream() | Dispatchers.Default | Returns Flow, collects on Default |
downloadModel() | Dispatchers.IO | Network I/O |
| Event emissions | Internal event loop | Delivered to collectors' context |
Thread Safety:
synchronized blocksThe SDK communicates with the C++ runanywhere-commons library via JNI:
// Kotlin side (jvmAndroidMain)
object CppBridge {
// Phase 1 initialization
external fun nativeInitialize(environment: Int, apiKey: String?, baseUrl: String?): Int
// Phase 2 services
external fun nativeInitializeServices(): Int
// LLM operations
external fun nativeLoadModel(modelId: String, modelPath: String): Int
external fun nativeGenerate(prompt: String, options: String): String
external fun nativeGenerateStream(prompt: String, options: String, callback: StreamCallback): Int
// STT operations
external fun nativeTranscribe(audioData: ByteArray, options: String): String
// TTS operations
external fun nativeSynthesize(text: String, options: String): ByteArray
// Shutdown
external fun nativeShutdown()
}
The SDK registers Kotlin callbacks with the C++ layer for platform-specific operations:
// Registered during Phase 1
object PlatformAdapter {
// File operations (called from C++)
fun readFile(path: String): ByteArray
fun writeFile(path: String, data: ByteArray)
fun fileExists(path: String): Boolean
// Logging (called from C++)
fun log(level: Int, tag: String, message: String)
// Keychain (called from C++)
fun secureStore(key: String, value: String)
fun secureRetrieve(key: String): String?
}
The SDK uses a modular architecture where AI backends are optional:
com.runanywhere.sdk:runanywhere-kotlinlibrac_commons.so, librunanywhere_jni.socom.runanywhere.sdk:runanywhere-core-llamacpplibrunanywhere_llamacpp.so (~34MB)InferenceFramework.LLAMA_CPPcom.runanywhere.sdk:runanywhere-core-onnxlibonnxruntime.so, libsherpa-onnx-*.so (~25MB)InferenceFramework.ONNX// Check which modules are available at runtime
val hasLLM = CppBridge.isCapabilityAvailable(SDKComponent.LLM)
val hasSTT = CppBridge.isCapabilityAvailable(SDKComponent.STT)
val hasTTS = CppBridge.isCapabilityAvailable(SDKComponent.TTS)
val hasVAD = CppBridge.isCapabilityAvailable(SDKComponent.VAD)
All feature-specific APIs are implemented as extension functions on RunAnywhere:
// Definition (in RunAnywhere+TextGeneration.kt)
expect suspend fun RunAnywhere.chat(prompt: String): String
// Implementation (in RunAnywhere+TextGeneration.jvmAndroid.kt)
actual suspend fun RunAnywhere.chat(prompt: String): String {
requireInitialized()
ensureServicesReady()
return CppBridge.nativeChat(prompt)
}
Benefits:
All operations return rich result types with metadata:
data class LLMGenerationResult(
val text: String, // Generated content
val thinkingContent: String?, // Reasoning (if model supports)
val inputTokens: Int, // Prompt tokens
val tokensUsed: Int, // Output tokens
val modelUsed: String, // Model ID
val latencyMs: Double, // Total time
val tokensPerSecond: Double, // Generation speed
val timeToFirstTokenMs: Double?, // TTFT (streaming)
)
┌─────────────────────────────────────────────────────────────────┐
│ C++ Event Producer │
│ (runanywhere-commons generates events) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CppEventBridge (JNI) │
│ (Callback registered during Phase 1) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ EventBus │
│ SharedFlow-based event distribution │
│ ┌─────────────┬─────────────┬─────────────┬─────────────┐ │
│ │ llmEvents │ sttEvents │ ttsEvents │ modelEvents │ │
│ │ (Flow) │ (Flow) │ (Flow) │ (Flow) │ │
│ └─────────────┴─────────────┴─────────────┴─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
App Collectors
| Category | Events |
|---|---|
| SDK | sdk.initialized, sdk.shutdown, sdk.error |
| Model | model.download_started, model.download_progress, model.download_completed, model.loaded, model.unloaded |
| LLM | llm.generation_started, llm.stream_token, llm.generation_completed, llm.generation_failed |
| STT | stt.transcription_started, stt.partial_result, stt.transcription_completed |
| TTS | tts.synthesis_started, tts.synthesis_completed, tts.playback_started |
// Subscribe to LLM events
lifecycleScope.launch {
RunAnywhere.events.llmEvents.collect { event ->
Log.d("LLM", "Event: ${event.type}, Latency: ${event.latencyMs}ms")
}
}
// Subscribe to all events
lifecycleScope.launch {
RunAnywhere.events.allEvents.collect { event ->
analytics.track(event.type, event.properties)
}
}
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Register │ ──► │ Download │ ──► │ Load │ ──► │ Unload │
│ (metadata) │ │ (network) │ │ (memory) │ │ (cleanup) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │
▼ ▼ ▼ ▼
ModelInfo DownloadProgress Model in RAM Memory freed
in registry events emitted ready for use model cached
// 1. Registered but not downloaded
model.isDownloaded == false
model.localPath == null
// 2. Downloaded but not loaded
model.isDownloaded == true
model.localPath != null
RunAnywhere.isLLMModelLoaded() == false
// 3. Loaded and ready
model.isDownloaded == true
RunAnywhere.isLLMModelLoaded() == true
RunAnywhere.currentLLMModelId == model.id
RunAnywhere.downloadModel(modelId)
│
├─► Emit: ModelEvent.DOWNLOAD_STARTED
│
├─► Fetch URL → Write chunks to temp file
│ │
│ └─► Emit: ModelEvent.DOWNLOAD_PROGRESS (0.0 → 1.0)
│
├─► Extract if archive (tar.gz, zip)
│
├─► Verify checksum (if provided)
│
├─► Move to final location
│
├─► Update model.localPath
│
└─► Emit: ModelEvent.DOWNLOAD_COMPLETED
data class SDKError(
val code: ErrorCode, // Specific error type
val category: ErrorCategory, // Error group
val message: String, // Human-readable
val cause: Throwable? // Underlying exception
) : Exception(message, cause)
| Category | Description | Example Errors |
|---|---|---|
INITIALIZATION | SDK startup | NOT_INITIALIZED, ALREADY_INITIALIZED |
MODEL | Model operations | MODEL_NOT_FOUND, MODEL_LOAD_FAILED |
LLM | Text generation | LLM_GENERATION_FAILED |
STT | Speech-to-text | STT_TRANSCRIPTION_FAILED |
TTS | Text-to-speech | TTS_SYNTHESIS_FAILED |
NETWORK | Network issues | NETWORK_UNAVAILABLE, TIMEOUT |
STORAGE | Storage issues | INSUFFICIENT_STORAGE, FILE_NOT_FOUND |
// Create errors with factory methods
throw SDKError.modelNotFound(modelId)
throw SDKError.llmGenerationFailed("Context length exceeded")
throw SDKError.networkUnavailable()
// From C++ error codes
val error = SDKError.fromRawValue(cppErrorCode, message)
Test business logic without native libraries:
@Test
fun testModelRegistration() {
val modelInfo = createTestModelInfo()
// Test URL parsing
assertEquals("qwen-0.5b", generateModelIdFromUrl(modelInfo.downloadURL))
// Test format detection
assertEquals(ModelFormat.GGUF, detectFormatFromUrl(modelInfo.downloadURL))
}
Test with mocked native layer:
@Test
fun testGenerationFlow() = runTest {
// Mock CppBridge
mockkObject(CppBridge)
every { CppBridge.nativeGenerate(any(), any()) } returns """
{"text": "Hello", "tokensUsed": 5, "latencyMs": 100}
"""
// Test generation
val result = RunAnywhere.generate("Hi")
assertEquals("Hello", result.text)
}
Test on real devices with actual models:
@Test
fun testRealInference() = runTest {
// Initialize SDK
RunAnywhere.initialize(environment = SDKEnvironment.DEVELOPMENT)
// Load a small test model
RunAnywhere.loadLLMModel("test-tiny-model")
// Run inference
val result = RunAnywhere.generate("2+2=")
assertNotNull(result.text)
assertTrue(result.latencyMs > 0)
}
| Operation | Latency | Notes |
|---|---|---|
| SDK Initialize (Phase 1) | 1-5ms | Synchronous |
| SDK Initialize (Phase 2) | 50-100ms | Async |
| Model Load (0.5B) | 500-800ms | First time, cached after |
| Inference (50 tokens) | 150-300ms | Depends on model size |
| Streaming TTFT | 50-100ms | Time to first token |
| STT Transcribe (5s audio) | 200-400ms | Whisper tiny |
| TTS Synthesize (100 chars) | 100-200ms | Sherpa ONNX |
| Component | Memory |
|---|---|
| SDK (no models) | ~5MB |
| 0.5B LLM (Q8) | ~500MB |
| 0.5B LLM (Q4) | ~300MB |
| Whisper Tiny | ~75MB |
| TTS Voice | ~50MB |