README.md
</picture>
</a>
<a href="https://trendshift.io/repositories/12239" target="_blank" rel="noopener noreferrer">
</a>
</a>
<a href="https://sdk.nexa.ai/wishlist">
</a>
<a href="https://x.com/nexa_ai"></a>
<a href="https://discord.com/invite/nexa-ai">
</a>
<a href="https://join.slack.com/t/nexa-ai-community/shared_invite/zt-3837k9xpe-LEty0disTTUnTUQ4O3uuNw">
</a>
NexaSDK lets you build the smartest and fastest on-device AI with minimum energy. It is a highly performant local inference framework that runs the latest multimodal AI models locally on NPU, GPU, and CPU - across Android, Windows, Linux, macOS, and iOS devices with a few lines of code.
NexaSDK supports latest models weeks or months before anyone else â Qwen3-VL, DeepSeek-OCR, Gemma3n (Vision), and more.
â Star this repo to keep up with exciting updates and new releases about latest on-device AI capabilities.
| Platform | Links |
|---|---|
| đĨī¸ CLI | Quick Start īŊ Docs |
| đ Python | Quick Start īŊ Docs |
| đ¤ Android | Quick Start īŊ Docs |
| đŗ Linux Docker | Quick Start īŊ Docs |
| đ iOS | Quick Start īŊ Docs |
Download:
| Windows | macOS | Linux |
|---|---|---|
| arm64 (Qualcomm NPU) | arm64 (Apple Silicon) | arm64 |
| x64 (Intel/AMD NPU) | x64 | x64 |
Run your first model:
# Chat with Qwen3
nexa infer ggml-org/Qwen3-1.7B-GGUF
# Multimodal: drag images into the CLI
nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF
# NPU (Windows arm64 with Snapdragon X Elite)
nexa infer NexaAI/OmniNeural-4B
pip install nexaai
from nexaai import LLM, GenerationConfig, ModelConfig, LlmChatMessage
llm = LLM.from_(model="NexaAI/Qwen3-0.6B-GGUF", config=ModelConfig())
conversation = [
LlmChatMessage(role="user", content="Hello, tell me a joke")
]
prompt = llm.apply_chat_template(conversation)
for token in llm.generate_stream(prompt, GenerationConfig(max_tokens=100)):
print(token, end="", flush=True)
Add to your app/AndroidManifest.xml
<application android:extractNativeLibs="true">
Add to your build.gradle.kts:
dependencies {
implementation("ai.nexa:core:0.0.19")
}
// Initialize SDK
NexaSdk.getInstance().init(this)
// Load and run model
VlmWrapper.builder()
.vlmCreateInput(VlmCreateInput(
model_name = "omni-neural",
model_path = "/data/data/your.app/files/models/OmniNeural-4B/files-1-1.nexa",
plugin_id = "npu",
config = ModelConfig()
))
.build()
.onSuccess { vlm ->
vlm.generateStreamFlow("Hello!", GenerationConfig()).collect { print(it) }
}
docker pull nexa4ai/nexasdk:latest
export NEXA_TOKEN="your_token_here"
docker run --rm -it --privileged \
-e NEXA_TOKEN \
nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU
Download NexaSdk.xcframework and add to your Xcode project.
import NexaSdk
// Example: Speech Recognition
let asr = try Asr(plugin: .ane)
try await asr.load(from: modelURL)
let result = try await asr.transcribe(options: .init(audioPath: "audio.wav"))
print(result.asrResult.transcript)
| Features | NexaSDK | Ollama | llama.cpp | LM Studio |
|---|---|---|---|---|
| NPU support | â NPU-first | â | â | â |
| Android/iOS SDK support | â NPU/GPU/CPU support | â ī¸ | â ī¸ | â |
| Linux support (Docker image) | â | â | â | â |
| Day-0 model support in GGUF, MLX, NEXA | â | â | â ī¸ | â |
| Full multimodality support | â Image, Audio, Text, Embedding, Rerank, ASR, TTS | â ī¸ | â ī¸ | â ī¸ |
| Cross-platform support | â Desktop, Mobile (Android, iOS), Automotive, IoT (Linux) | â ī¸ | â ī¸ | â ī¸ |
| One line of code to run | â | â | â ī¸ | â |
| OpenAI-compatible API + Function calling | â | â | â | â |
We would like to thank the following projects:
NexaSDK uses a dual licensing model:
Licensed under Apache License 2.0.
For model launching partner, business inquiries, or any other questions, please schedule a call with us here.
Want more model support, backend support, device support or other features? We'd love to hear from you!
Feel free to submit an issue on our GitHub repository with your requests, suggestions, or feedback. Your input helps us prioritize what to build next.
Join our community:
Round 1: Build a working Android AI app that runs fully on-device on Qualcomm Hexagon NPU with NexaSDK.
Timeline (PT): Jan 15 â Feb 15 Prizes: $6,500 cash prize, Qualcomm official spotlight, flagship Snapdragon device, expert mentorship, and more
đ Join & details: https://sdk.nexa.ai/bounty