bindings/android/README.md
š£ NexaSDK for Android is highlighted by Qualcomm blog as "a simple way to bring on-device AI to smartphones with Snapdragon"
The Nexa AI Android SDK enables on-device AI inference for Android applications with NPU acceleration. Run Large Language Models (LLMs), Vision-Language Models (VLMs), Embeddings, Speech Recognition (ASR), Reranking, and Computer Vision models on Android devices with support for NPU, GPU, and CPU inference.
š For full documentation, see Android SDK Doc.
| Component | Requirement |
|---|---|
| NPU | Qualcomm Snapdragon 8 Gen 4 (optimized) |
| GPU | Qualcomm Adreno GPU |
| CPU | ARM64-v8a |
| RAM | 4GB+ recommended |
| Storage | 100MB - 4GB (varies by model) |
Download and install the pre-built APK:
# Download: https://nexa-model-hub-bucket.s3.us-west-1.amazonaws.com/public/android-demo-release/nexaai-demo-app.apk
adb install nexaai-demo-app.apk
For running GPT-OSS model on Qualcomm NPU:
# Download: https://nexa-model-hub-bucket.s3.us-west-1.amazonaws.com/public/nexa_sdk/huggingface-models/gpt-oss-android-demo/nexaai-gpt-oss-npu.apk
adb install nexaai-gpt-oss-npu.apk
š¬ Watch the tutorial video showing how to run the sample app in 40 seconds.
Clone the repository
git clone https://github.com/NexaAI/nexa-sdk/
Open in Android Studio
Open the bindings/android folder in Android Studio.
Download a model
Follow the Android SDK Doc to download a model. Below are some examples to download:
Place the model in the app's data directory:
/data/data/com.nexa.demo/files/models/<model-name>
Build and run the app in Android Studio
This walkthrough uses the LFM2-24B-A2B-Preview-GGUF model in the demo app.
Install the app
Install the demo app (APK or build from source as above).
Select the model
Open the model selector (dropdown next to the model name) and choose LFM2-24B-A2B-Preview-GGUF.
Download
Tap Download to fetch the model to your device. Wait until the download finishes.
Load
Tap Load. A load model config dialog appears: choose CPU, GPU, or NPU (for Qualcomm NPU), then tap SURE. Once the model is loaded, the chat area becomes available.
Chat
Type your message in the input field at the bottom, then tap Send to get a response. Use Clear to clear the input or conversation as needed.