plugins/plugin-native-llama/README.md
Mobile llama.cpp adapter for Eliza. A thin wrapper over
llama-cpp-capacitor that
maps its contextId-based API onto Eliza's LocalInferenceLoader contract,
so the standard ActiveModelCoordinator in @elizaos/app-core can switch
between the desktop (node-llama-cpp) engine and mobile native inference
transparently.
localInferenceLoader service during the
Capacitor bootstrap.loadModel({ modelPath }) → initContext.unloadModel() → releaseContext / releaseAllContexts.generate() surface matching the desktop engine.@LlamaCpp_onToken stream out to Eliza's token listeners.llama-cpp-capacitor
handles iOS (arm64 + x86_64 with Metal) and Android (arm64-v8a,
armeabi-v7a, x86, x86_64) itself.node-llama-cpp engine in @elizaos/app-core.Install the dependency (already declared here):
bun install
Register the loader during Capacitor bootstrap. In apps/app's
Capacitor init path (currently in src/capacitor-shell.ts or the
runtime bootstrap that owns the mobile AgentRuntime):
import { registerCapacitorLlamaLoader } from "@elizaos/capacitor-llama";
// After runtime boot, before the Model Hub is mounted:
registerCapacitorLlamaLoader(runtime);
Run bunx cap sync in apps/app to pick up the native plugin. iOS and
Android builds will pull in llama-cpp-capacitor's prebuilt native
libraries automatically.
load() disposes the previous
context first so we never double-allocate VRAM on device.@elizaos/app-core downloader (shared with desktop). The mobile UI
filters the catalog to small/tiny bucket models only, since anything
larger won't realistically run on a phone.@LlamaCpp_onToken). Subscribe via capacitorLlama.onToken(listener).llama-cpp-capacitor README.
This adapter only wires the minimal slice needed for Eliza's agent
runtime; extend it as the mobile product grows.MIT — matches llama-cpp-capacitor and llama.cpp upstream.