LoRA Adapter Support - Implementation Documentation

Overview
Kotlin SDK Usage Guide
C/C++ API Reference
Architecture
- Layer Diagram
- Vtable Dispatch
llama.cpp LoRA API (b8011)
Optimizations and Design Decisions
Files Changed
How to Extend
Build Verification
Changelog

Overview

LoRA (Low-Rank Adaptation) adapter support was added to the RunAnywhere SDK across two modules: sdk/runanywhere-commons (C/C++) and sdk/runanywhere-kotlin (Kotlin Multiplatform). This enables users to load fine-tuned LoRA adapters (GGUF format) alongside a base model, hot-swap adapters without reloading the base model, stack multiple adapters with individual scales, and remove adapters at runtime.

The implementation spans 6 layers, bottom-up: C++ internal, C API, component, JNI bridge, Kotlin bridge, and Kotlin public API.

Kotlin SDK Usage Guide

Prerequisites

Before using LoRA adapters:

The RunAnywhere SDK must be initialized
The LlamaCPP backend must be registered
A base model must be loaded via RunAnywhere.loadLLMModel()
LoRA adapter files must be in GGUF format

kotlin

import com.runanywhere.sdk.public.RunAnywhere
import com.runanywhere.sdk.public.extensions.loadLoraAdapter
import com.runanywhere.sdk.public.extensions.removeLoraAdapter
import com.runanywhere.sdk.public.extensions.clearLoraAdapters
import com.runanywhere.sdk.public.extensions.getLoadedLoraAdapters
import com.runanywhere.sdk.public.extensions.LLM.LoRAAdapterConfig
import com.runanywhere.sdk.public.extensions.LLM.LoRAAdapterInfo

Data Types

LoRAAdapterConfig -- Configuration passed when loading an adapter.

kotlin

data class LoRAAdapterConfig(
    val path: String,        // Path to the LoRA GGUF file (must not be blank)
    val scale: Float = 1.0f, // Scale factor: 0.0 = no effect, 1.0 = full effect, >1.0 = amplified
)

LoRAAdapterInfo -- Read-only info returned when querying loaded adapters.

kotlin

data class LoRAAdapterInfo(
    val path: String,      // Path used when loading
    val scale: Float,      // Active scale factor
    val applied: Boolean,  // Whether the adapter is currently applied to the context
)

Loading a LoRA Adapter

Load a GGUF LoRA file and apply it to the current model. The SDK recreates the llama.cpp context internally and clears the KV cache.

kotlin

// Load with default scale (1.0)
RunAnywhere.loadLoraAdapter(LoRAAdapterConfig(path = "/path/to/adapter.gguf"))

// Load with custom scale (0.5 = half strength)
RunAnywhere.loadLoraAdapter(
    LoRAAdapterConfig(path = "/path/to/adapter.gguf", scale = 0.5f)
)

All functions are suspend -- call them from a coroutine scope.

Stacking Multiple Adapters

Multiple adapters can be applied simultaneously. Each adapter has its own scale. The effects combine additively at the weight level.

kotlin

// Load base writing style adapter
RunAnywhere.loadLoraAdapter(
    LoRAAdapterConfig(path = "/path/to/style.gguf", scale = 1.0f)
)

// Stack a domain knowledge adapter on top
RunAnywhere.loadLoraAdapter(
    LoRAAdapterConfig(path = "/path/to/domain.gguf", scale = 0.7f)
)

// Check what's loaded
val adapters = RunAnywhere.getLoadedLoraAdapters()
// adapters.size == 2

Removing Adapters

kotlin

// Remove a specific adapter by path
RunAnywhere.removeLoraAdapter("/path/to/style.gguf")

// Remove all adapters at once
RunAnywhere.clearLoraAdapters()

After removal, the context is recreated and KV cache is cleared. Any remaining adapters are re-applied automatically.

Querying Loaded Adapters

kotlin

val adapters: List<LoRAAdapterInfo> = RunAnywhere.getLoadedLoraAdapters()

for (adapter in adapters) {
    println("Path: ${adapter.path}")
    println("Scale: ${adapter.scale}")
    println("Applied: ${adapter.applied}")
}

Returns an empty list if no adapters are loaded or if no model is loaded.

Error Handling

All LoRA functions throw SDKError on failure:

kotlin

try {
    RunAnywhere.loadLoraAdapter(LoRAAdapterConfig(path = "/invalid/path.gguf"))
} catch (e: SDKError) {
    // SDKError.notInitialized -- SDK not initialized
    // SDKError.llm            -- C++ operation failed (bad path, incompatible adapter, etc.)
    println("LoRA error: ${e.message}")
}

Common failure causes:

SDK not initialized (SDKError.notInitialized)
No model loaded (SDKError.llm with "no model loaded")
Invalid adapter file or path (SDKError.llm)
Adapter already loaded with same path (SDKError.llm with duplicate detection)
Adapter incompatible with base model (SDKError.llm)

Android ViewModel Example

A typical Android integration pattern using ViewModel and Compose:

kotlin

class LlmViewModel : ViewModel() {

    data class UiState(
        val modelLoaded: Boolean = false,
        val loraAdapters: List<LoRAAdapterInfo> = emptyList(),
        val error: String? = null,
    )

    private val _state = MutableStateFlow(UiState())
    val state = _state.asStateFlow()

    fun loadLoraAdapter(path: String, scale: Float = 1.0f) {
        viewModelScope.launch {
            try {
                RunAnywhere.loadLoraAdapter(LoRAAdapterConfig(path, scale))
                refreshAdapterList()
            } catch (e: SDKError) {
                _state.update { it.copy(error = e.message) }
            }
        }
    }

    fun clearAdapters() {
        viewModelScope.launch {
            RunAnywhere.clearLoraAdapters()
            refreshAdapterList()
        }
    }

    private suspend fun refreshAdapterList() {
        val adapters = RunAnywhere.getLoadedLoraAdapters()
        _state.update { it.copy(loraAdapters = adapters) }
    }
}

For a full working Android app, see examples/android/RunAnyWhereLora/.

C/C++ API Reference (for other SDK implementations)

This section documents the C functions that back the JNI layer. Any language that can call C functions (Swift, Python, Dart, Rust, C#, etc.) can use these directly to implement LoRA support without going through JNI/Kotlin.

There are two API levels to choose from:

API Level 1: Component API (Recommended)

Header: include/rac/features/llm/rac_llm_component.h Library: librac_commons.so / RACommons.xcframework

These are the high-level functions. They handle mutex locking, service lookup, and vtable dispatch internally. Use these unless you have a reason to call the backend directly.

#include "rac/features/llm/rac_llm_component.h"

// handle = the rac_handle_t returned by rac_llm_component_create()

// ---- Load a LoRA adapter ----
// Loads a GGUF LoRA file and applies it to the current model.
// Context is recreated internally. KV cache is cleared.
// Duplicate paths are rejected.
//
// Returns: RAC_SUCCESS, RAC_ERROR_INVALID_HANDLE, RAC_ERROR_INVALID_ARGUMENT,
//          RAC_ERROR_COMPONENT_NOT_READY, RAC_ERROR_NOT_SUPPORTED,
//          or backend-specific error code
rac_result_t rac_llm_component_load_lora(
    rac_handle_t handle,       // Component handle
    const char* adapter_path,  // Absolute path to LoRA .gguf file
    float scale                // 0.0 = no effect, 1.0 = full, >1.0 = amplified
);

// ---- Remove a specific adapter ----
// Removes the adapter that was loaded from the given path.
// Context is recreated and KV cache is cleared.
//
// Returns: RAC_SUCCESS, RAC_ERROR_NOT_FOUND, RAC_ERROR_COMPONENT_NOT_READY
rac_result_t rac_llm_component_remove_lora(
    rac_handle_t handle,
    const char* adapter_path   // Must match the path used in load_lora
);

// ---- Clear all adapters ----
// Removes every loaded adapter. Safe to call with no adapters loaded.
//
// Returns: RAC_SUCCESS
rac_result_t rac_llm_component_clear_lora(
    rac_handle_t handle
);

// ---- Query loaded adapters ----
// Returns a JSON array string describing all loaded adapters.
// Format: [{"path":"/path/to/file.gguf","scale":1.0,"applied":true}, ...]
// Caller MUST free the returned string with free().
//
// Returns: RAC_SUCCESS, RAC_ERROR_COMPONENT_NOT_READY
rac_result_t rac_llm_component_get_lora_info(
    rac_handle_t handle,
    char** out_json            // Output: heap-allocated JSON string
);

JNI mapping (for reference -- how the Kotlin bridge calls these):

JNI Function	C Function	Notes
`racLlmComponentLoadLora(long handle, String path, float scale)`	`rac_llm_component_load_lora(handle, path, scale)`	Returns `int` (0 = success)
`racLlmComponentRemoveLora(long handle, String path)`	`rac_llm_component_remove_lora(handle, path)`	Returns `int`
`racLlmComponentClearLora(long handle)`	`rac_llm_component_clear_lora(handle)`	Returns `int`
`racLlmComponentGetLoraInfo(long handle)`	`rac_llm_component_get_lora_info(handle, &json)`	Returns `String?` (JSON)

API Level 2: Backend API (LlamaCPP-specific)

Header: include/rac/backends/rac_llm_llamacpp.h Library: librac_backend_llamacpp.so / RABackendLLAMACPP.xcframework

These are low-level functions that talk directly to the LlamaCPP backend. Use these if you want to bypass the component layer (e.g., building a custom pipeline without the lifecycle manager). You must handle your own locking.

#include "rac/backends/rac_llm_llamacpp.h"

// handle = the backend impl pointer (NOT the component handle).
// Obtained from rac_llm_service_t.impl after creating a service.

// Load and apply a LoRA adapter. Context is recreated internally.
rac_result_t rac_llm_llamacpp_load_lora(
    rac_handle_t handle,
    const char* adapter_path,
    float scale
);

// Remove a specific adapter by path.
rac_result_t rac_llm_llamacpp_remove_lora(
    rac_handle_t handle,
    const char* adapter_path
);

// Clear all adapters.
rac_result_t rac_llm_llamacpp_clear_lora(
    rac_handle_t handle
);

// Get adapter info as JSON. Caller must free(*out_json).
rac_result_t rac_llm_llamacpp_get_lora_info(
    rac_handle_t handle,
    char** out_json
);

Vtable Integration (for new backends)

If you are adding LoRA support to a different backend (not LlamaCPP), implement these 4 function pointers in your rac_llm_service_ops_t vtable:

#include "rac/features/llm/rac_llm_service.h"

typedef struct rac_llm_service_ops {
    // ... existing ops (initialize, generate, generate_stream, etc.) ...

    // LoRA ops -- set to NULL if your backend doesn't support LoRA
    rac_result_t (*load_lora)(void* impl, const char* adapter_path, float scale);
    rac_result_t (*remove_lora)(void* impl, const char* adapter_path);
    rac_result_t (*clear_lora)(void* impl);
    rac_result_t (*get_lora_info)(void* impl, char** out_json);
} rac_llm_service_ops_t;

The component layer checks for NULL before calling. If your backend sets these to NULL, calls return RAC_ERROR_NOT_SUPPORTED.

Usage Example (C)

Complete example of loading a model and applying a LoRA adapter using the component API:

#include "rac/core/rac_core.h"
#include "rac/backends/rac_llm_llamacpp.h"
#include "rac/features/llm/rac_llm_component.h"

int main() {
    // 1. Initialize SDK
    rac_init(NULL);
    rac_backend_llamacpp_register();

    // 2. Create and load model via component
    rac_handle_t component = 0;
    rac_llm_component_create(&component);
    rac_llm_component_load_model(component, "/path/to/model.gguf",
                                  "my-model", "My Model", NULL);

    // 3. Load LoRA adapter (scale = 0.8)
    rac_result_t r = rac_llm_component_load_lora(
        component, "/path/to/adapter.gguf", 0.8f);
    if (r != RAC_SUCCESS) {
        printf("Failed to load LoRA: %s\n", rac_error_message(r));
        return 1;
    }

    // 4. Stack a second adapter
    rac_llm_component_load_lora(component, "/path/to/adapter2.gguf", 0.5f);

    // 5. Query what's loaded
    char* json = NULL;
    rac_llm_component_get_lora_info(component, &json);
    if (json) {
        printf("Adapters: %s\n", json);
        // Output: [{"path":"/path/to/adapter.gguf","scale":0.8,"applied":true},
        //          {"path":"/path/to/adapter2.gguf","scale":0.5,"applied":true}]
        free(json);
    }

    // 6. Generate text (adapters are applied automatically)
    rac_llm_options_t opts = RAC_LLM_OPTIONS_DEFAULT;
    rac_llm_result_t result = {0};
    rac_llm_component_generate(component, "Hello, world!", &opts, &result);
    printf("Response: %s\n", result.text);
    rac_llm_result_free(&result);

    // 7. Remove one adapter
    rac_llm_component_remove_lora(component, "/path/to/adapter.gguf");

    // 8. Clear all adapters
    rac_llm_component_clear_lora(component);

    // 9. Cleanup
    rac_llm_component_destroy(component);
    rac_shutdown();
    return 0;
}

Usage Example (Swift -- iOS SDK pattern)

For Swift SDK implementers, the pattern would be:

swift

// The C functions are imported via CRACommons module
import CRACommons

// Load adapter
let result = rac_llm_component_load_lora(componentHandle, path, scale)
guard result == RAC_SUCCESS else {
    throw SDKError.llm("LoRA load failed: \(rac_error_message(result))")
}

// Query adapters
var jsonPtr: UnsafeMutablePointer<CChar>? = nil
rac_llm_component_get_lora_info(componentHandle, &jsonPtr)
if let json = jsonPtr {
    let jsonString = String(cString: json)
    free(json)
    // Parse JSON string into Swift structs
}

Return Codes Reference

Code	Constant	Meaning
0	`RAC_SUCCESS`	Operation succeeded
-1	`RAC_ERROR_INVALID_HANDLE`	NULL or invalid component handle
-2	`RAC_ERROR_INVALID_ARGUMENT`	NULL adapter_path
-236	`RAC_ERROR_NOT_SUPPORTED`	Backend does not implement LoRA (vtable entry is NULL)
-230	`RAC_ERROR_COMPONENT_NOT_READY`	No model loaded
-110	`RAC_ERROR_MODEL_NOT_FOUND`	Adapter file path doesn't exist
-600+	Backend-specific	Duplicate path, incompatible adapter, context recreation failure

Architecture

Layer Diagram

Kotlin Public API (RunAnywhere.loadLoraAdapter)
       |
       v
Kotlin Bridge (CppBridgeLLM.loadLoraAdapter)
       |
       v
JNI Native (RunAnywhereBridge.racLlmComponentLoadLora)
       |
       v
Component C API (rac_llm_component_load_lora)
       |
       v  [vtable dispatch: llm_service->ops->load_lora()]
Service Vtable (rac_llm_service_ops_t)
       |
       v
Backend C API (rac_llm_llamacpp_load_lora)
       |
       v
C++ Internal (LlamaCppTextGeneration::load_lora_adapter)
       |
       v
llama.cpp API (llama_adapter_lora_init + llama_set_adapter_lora)

Each layer only talks to the one directly below it. No layer skips.

Vtable Dispatch

The component layer (llm_component.cpp) does NOT directly call backend-specific functions. Instead, it dispatches through the rac_llm_service_ops_t vtable:

// Component dispatches through vtable (backend-agnostic)
auto* llm_service = reinterpret_cast<rac_llm_service_t*>(service);
if (!llm_service->ops || !llm_service->ops->load_lora)
    return RAC_ERROR_NOT_SUPPORTED;
return llm_service->ops->load_lora(llm_service->impl, adapter_path, scale);

The llamacpp backend registers its LoRA vtable entries during service creation in rac_backend_llamacpp_register.cpp. Backends that do not support LoRA leave these pointers as NULL, and the component returns RAC_ERROR_NOT_SUPPORTED.

This keeps librac_commons.so decoupled from librac_backend_llamacpp.so.

llama.cpp LoRA API (b8201)

The implementation uses these llama.cpp functions:

Function	Purpose
`llama_adapter_lora_init(model, path)`	Load adapter tensors from GGUF file
`llama_set_adapters_lora(ctx, adapters[], n, scales[])`	Apply adapter(s) to context with scale(s)
`llama_memory_clear(memory, true)`	Clear KV cache after adapter changes
`llama_adapter_meta_val_str(adapter, key, buf, size)`	Read adapter GGUF metadata by key
`llama_adapter_meta_count(adapter)`	Get number of metadata entries
`llama_adapter_meta_key_by_index(adapter, i, buf, size)`	Read metadata key by index
`llama_adapter_meta_val_str_by_index(adapter, i, buf, size)`	Read metadata value by index

Note: llama_adapter_lora_free() is deprecated in b8201 — "adapters are now freed together with the associated model". Do NOT call it manually.

Internal header dependency: The implementation includes llama-adapter.h (internal llama.cpp header) to access adapter->ab_map.size() for tensor match validation. This is pinned to llama.cpp b8201 via VERSIONS file. Must be verified on version bumps.

Optimizations and Design Decisions

Context Recreation

Per llama.cpp docs: "All adapters must be loaded before context creation." When a new adapter is loaded after the model is already running, the implementation recreates the context so the compute graph properly accounts for LoRA operations:

Free old sampler and context
Create new context with same parameters (context_size, batch_size, num_threads)
Rebuild greedy sampler chain (real sampler rebuilt on next generate_stream())
Invalidate cached sampler params (temperature, top_p, top_k, repetition_penalty)
Re-apply ALL loaded adapters via llama_set_adapters_lora()
KV cache is already empty from fresh context — no explicit clear needed

This is handled by recreate_context() + apply_lora_adapters() in llamacpp_backend.cpp.

Pre-Generation Adapter Verification

Before each generate_stream() call, the implementation checks that all loaded adapters have applied == true. If any adapter is not applied (e.g., due to a prior failure), it attempts to re-apply via apply_lora_adapters(). If re-apply fails, generation is aborted with an error rather than silently ignoring the adapter.

KV Cache Invalidation

After any adapter change (load, remove, clear), the KV cache is always cleared via llama_memory_clear(llama_get_memory(context_), true). This is mandatory because cached key-value pairs were computed with the previous adapter configuration and would produce incorrect results.

Thread Safety

All LoRA operations acquire the same mutex (mtx_) used by the text generation inference loop. This guarantees that adapters are never modified while inference is in progress. The lock hierarchy is:

C++ layer: std::lock_guard<std::mutex> on mtx_ (already used by generate)
Component layer: std::lock_guard<std::mutex> on component->mtx
Kotlin bridge layer: synchronized(lock) on the CppBridgeLLM lock object

Input Validation

load_lora_adapter() performs multi-stage validation before touching llama.cpp:

Scale validation — must be positive and finite (scale > 0.0f && isfinite(scale))
Duplicate detection — rejects if same path already loaded
File existence — opens file with std::ifstream to verify it exists
GGUF magic check — reads first 4 bytes and verifies 0x46554747 ("GGUF" LE)
Tensor match validation — after llama_adapter_lora_init(), checks adapter->ab_map.size() > 0 to ensure the adapter actually matched model tensors (catches wrong-base-model errors)
Metadata logging — dumps adapter GGUF metadata (alpha, rank, etc.) for diagnostics

Rollback on Failure

If context recreation fails after an adapter is loaded, the adapter entry is popped from the lora_adapters_ vector. Same if apply_lora_adapters() fails. This prevents the tracking vector from going out of sync with actual context state.

Adapter Memory Lifecycle

Adapters are stored in a std::vector<LoraAdapterEntry> on the LlamaCppTextGeneration instance. When unload_model_internal() is called, adapters are cleared from the context first, then the vector is cleared, then the context and model are freed. This ordering prevents use-after-free.

Files Changed

Layer 1: C++ Internal

File	Changes
`sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h`	Added `LoraAdapterEntry` struct, 4 public methods (`load_lora_adapter`, `remove_lora_adapter`, `clear_lora_adapters`, `get_lora_info`), 2 private helpers (`recreate_context`, `apply_lora_adapters`), `lora_adapters_` vector member
`sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`	Implemented 6 new methods. Modified `unload_model_internal()` to clear adapters before freeing context/model

Layer 2: Backend C API

File	Changes
`sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h`	Added 4 C function declarations: `rac_llm_llamacpp_load_lora`, `rac_llm_llamacpp_remove_lora`, `rac_llm_llamacpp_clear_lora`, `rac_llm_llamacpp_get_lora_info`
`sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp`	Implemented 4 C functions. Pattern: validate handle, cast to impl, call C++ method, return result

Layer 3: Vtable + Component Wrappers

File	Changes
`sdk/runanywhere-commons/include/rac/features/llm/rac_llm_service.h`	Added 4 optional LoRA function pointers to `rac_llm_service_ops_t` vtable: `load_lora`, `remove_lora`, `clear_lora`, `get_lora_info`
`sdk/runanywhere-commons/include/rac/features/llm/rac_llm_component.h`	Added 4 component-level function declarations
`sdk/runanywhere-commons/src/features/llm/llm_component.cpp`	Implemented 4 component functions. Dispatches through vtable with NULL checks (returns `RAC_ERROR_NOT_SUPPORTED` if backend doesn't implement LoRA)
`sdk/runanywhere-commons/src/backends/llamacpp/rac_backend_llamacpp_register.cpp`	Added 4 vtable wrapper functions and wired them into `g_llamacpp_ops`

Layer 4: JNI Bridge

File	Changes
`sdk/runanywhere-commons/src/jni/runanywhere_commons_jni.cpp`	Added 4 JNI functions: `racLlmComponentLoadLora`, `racLlmComponentRemoveLora`, `racLlmComponentClearLora`, `racLlmComponentGetLoraInfo`

Layer 5: Kotlin Bridge

File	Changes
`sdk/runanywhere-kotlin/src/jvmAndroidMain/.../RunAnywhereBridge.kt`	Added 4 `external` JNI method declarations
`sdk/runanywhere-kotlin/src/jvmAndroidMain/.../CppBridgeLLM.kt`	Added 4 bridge methods with synchronized access, state validation, and logging

Layer 6: Kotlin Public API

File	Changes
`sdk/runanywhere-kotlin/src/commonMain/.../LLMTypes.kt`	Added `LoRAAdapterConfig` and `LoRAAdapterInfo` data classes
`sdk/runanywhere-kotlin/src/commonMain/.../RunAnywhere+LoRA.kt`	NEW file. `expect` declarations for 4 public API functions
`sdk/runanywhere-kotlin/src/jvmAndroidMain/.../RunAnywhere+LoRA.jvmAndroid.kt`	NEW file. `actual` implementations with init checks, CppBridgeLLM delegation, JSON parsing for adapter info

Android Example App

File	Changes
`examples/android/RunAnywhereAI/.../data/ModelList.kt`	Switched LoRA adapter from `lora-adapter.gguf` (4.3MB, ineffective) to `qwen2.5-0.5b-abliterated-lora-f16.gguf` (17.6MB F16, abliterated). Updated catalog entry ID, name, filename, fileSize.
`examples/android/RunAnywhereAI/.../data/LoraExamplePrompts.kt`	Updated prompt filename key to match new adapter filename
`examples/android/RunAnywhereAI/.../presentation/chat/ChatScreen.kt`	Updated starter prompt suggestions for LoRA demo comparison

How to Extend

Adding a new LoRA operation

Follow the same 6-layer pattern:

Add C++ method to LlamaCppTextGeneration in llamacpp_backend.h/.cpp
Add C function to rac_llm_llamacpp.h/.cpp
Add vtable entry to rac_llm_service_ops_t in rac_llm_service.h
Wire vtable entry in rac_backend_llamacpp_register.cpp
Add component wrapper to rac_llm_component.h / llm_component.cpp (dispatch through vtable)
Add JNI function to runanywhere_commons_jni.cpp
Add external declaration to RunAnywhereBridge.kt, bridge method to CppBridgeLLM.kt
Add expect/actual declarations to RunAnywhere+LoRA.kt / RunAnywhere+LoRA.jvmAndroid.kt

Adding scale adjustment without reload

Could be done by calling llama_set_adapter_lora(ctx, adapter, new_scale) directly without context recreation. Would need a new method at each layer.

Build Verification

Android native build (confirmed passing):

bash

cd sdk/runanywhere-commons
./scripts/build-android.sh

C++ desktop build (confirmed passing):

bash

cd sdk/runanywhere-commons
cmake -B build/dev -DRAC_BUILD_BACKENDS=ON -DRAC_BUILD_JNI=ON
cmake --build build/dev

After Android build, copy .so files to jniLibs:

bash

DIST=sdk/runanywhere-commons/dist/android
JNILIBS=sdk/runanywhere-kotlin/modules/runanywhere-core-llamacpp/src/androidMain/jniLibs/arm64-v8a
/usr/bin/cp $DIST/llamacpp/arm64-v8a/librac_backend_llamacpp.so $JNILIBS/
/usr/bin/cp $DIST/llamacpp/arm64-v8a/librac_backend_llamacpp_jni.so $JNILIBS/
/usr/bin/cp $DIST/llamacpp/arm64-v8a/librac_commons.so $JNILIBS/
/usr/bin/cp $DIST/llamacpp/arm64-v8a/libc++_shared.so $JNILIBS/
/usr/bin/cp $DIST/llamacpp/arm64-v8a/libomp.so $JNILIBS/
/usr/bin/cp $DIST/jni/arm64-v8a/librunanywhere_jni.so $JNILIBS/

Kotlin build:

bash

cd sdk/runanywhere-kotlin
./scripts/sdk.sh build

Changelog

Date	Author	Description
2026-02-19	Claude	Initial implementation of LoRA adapter support across all 6 layers (C++ through Kotlin public API). C++ desktop build verified.
2026-02-19	Claude	Fixed architecture: Component layer now dispatches LoRA ops through vtable (`rac_llm_service_ops_t`) instead of calling backend directly. This decouples `librac_commons.so` from `librac_backend_llamacpp.so`. Added 4 vtable entries and wrapper functions. Fixed `AttachCurrentThread` cast for Android NDK C++ build. Android native build verified.
2026-02-19	Claude	Added detailed Kotlin SDK usage guide with data types, code examples, error handling, Android ViewModel pattern, and table of contents with section links. Updated "How to Extend" to include vtable step.
2026-03-09	Claude	LoRA fix & hardening. Fixed LoRA adapter having no effect — root cause: wrong adapter file (4.3MB generic vs 17.6MB abliterated F16). Updated Android app to use `qwen2.5-0.5b-abliterated-lora-f16.gguf`. Added C++ validation: scale check, GGUF magic verification, tensor match count via `ab_map` (internal header `llama-adapter.h`), adapter metadata logging, pre-generation adapter state verification. Updated API from deprecated `llama_set_adapter_lora` to `llama_set_adapters_lora` (batch API, b8201). Updated docs to reflect llama.cpp b8201 API changes.

LoRA Adapter Support - Implementation Documentation

LoRA Adapter Support - Implementation Documentation

Table of Contents

Overview

Kotlin SDK Usage Guide

Prerequisites

Data Types

Loading a LoRA Adapter

Stacking Multiple Adapters

Removing Adapters

Querying Loaded Adapters

Error Handling

Android ViewModel Example

C/C++ API Reference (for other SDK implementations)

API Level 1: Component API (Recommended)

API Level 2: Backend API (LlamaCPP-specific)

Vtable Integration (for new backends)

Usage Example (C)

Usage Example (Swift -- iOS SDK pattern)

Return Codes Reference

Architecture

Layer Diagram

Vtable Dispatch

llama.cpp LoRA API (b8201)

Optimizations and Design Decisions

Context Recreation

Pre-Generation Adapter Verification

KV Cache Invalidation

Thread Safety

Input Validation

Rollback on Failure

Adapter Memory Lifecycle

Files Changed

Layer 1: C++ Internal

Layer 2: Backend C API

Layer 3: Vtable + Component Wrappers

Layer 4: JNI Bridge

Layer 5: Kotlin Bridge

Layer 6: Kotlin Public API

Android Example App

How to Extend

Adding a new LoRA operation

Adding scale adjustment without reload

Build Verification

Changelog