NexaAI Python SDK

This directory contains the NexaAI Python SDK and comprehensive examples for various AI inference tasks.

Quick Start

The easiest way to get started with NexaAI is through our interactive Jupyter notebooks. You can find example notebooks in the notebook/ directory.

Each notebook includes:

LLM (Large Language Model): Text generation and conversation
VLM (Vision Language Model): Multimodal understanding and generation
Embedder: Text vectorization and similarity computation
Reranker: Document reranking
ASR (Automatic Speech Recognition): Speech-to-text transcription
TTS (Text-to-Speech): Text-to-speech synthesis
Diarize: Speaker diarization
ImageGen: Image generation from text or image-to-image transformation
CV (Computer Vision): OCR/text recognition

Prerequisites

Python 3
Nexa CLI installed

Installation

bash

pip install nexaai -v

Command Line Examples

If you prefer command-line usage, here are the basic examples:

LLM

bash

python llm.py

bash

python vlm.py

Reranker

bash

python rerank.py

Embedder

bash

python embedder.py

Computer Vision

bash

python cv_ocr.py

TTS (Text-to-Speech)

bash

python tts.py --text "Hello, world!"

Diarize

bash

python diarize.py --audio path/to/audio.wav

ImageGen

bash

# Text-to-image
python image_gen.py --prompt "A beautiful sunset over the ocean"

# Image-to-image
python image_gen.py --prompt "A beautiful sunset" --init-image path/to/image.png

Common Arguments

--model: Path to the model file
--device: Device to run on (cpu, gpu, etc.)
--max-tokens: Maximum tokens to generate (for LLM/VLM)
--batch-size: Batch size for processing
--system: System message for chat models
--plugin-id: Plugin ID to use (default: cpu_gpu)

Plugin ID Options

The --plugin-id parameter supports different backends:

cpu_gpu: Default, supports both CPU and GPU
metal: Apple Silicon optimized (for supported models)
npu: Qualcomm NPU optimized (for supported models)
nexaml: NexaML optimized (for supported models)

Supported Models by Backend

Backend	Supported Models
`cpu_gpu`	GGUF models (default backend)
`metal`	Models with MLX format (e.g., Qwen3-VL-4B-MLX-4bit, gpt-oss-20b-MLX-4bit)
`npu`	LLM: Granite-4-Micro-NPU, phi4-mini-npu-turbo, Qwen3-4B-Instruct-2507-npu, Qwen3-4B-Thinking-2507-npu, Llama3.2-3B-NPU-Turbo, jan-v1-4B-npu, qwen3-4B-npu, phi3.5-mini-npu
VLM: Qwen3-VL-4B-Instruct-NPU, OmniNeural-4B, LFM2-1.2B-npu
Embedder: embeddinggemma-300m-npu
ASR: parakeet-tdt-0.6b-v3-npu
CV: convnext-tiny-npu, paddleocr-npu, yolov12-npu
Reranker: jina-v2-rerank-npu
`nexaml`	VLM: Qwen3-VL-4B-Instruct-GGUF:Q4_0, Qwen3-VL-4B-Thinking-GGUF:Q4_0

Getting Started

Open a notebook from the notebook/ directory
Follow the setup instructions in the notebook
Run the examples step by step to explore different AI capabilities
Customize the examples for your specific use cases

For detailed setup instructions, please refer to the individual notebooks.

NexaAI Python SDK

NexaAI Python SDK

Quick Start

Prerequisites

Installation

Command Line Examples

LLM

Multi-Modal

Reranker

Embedder

Computer Vision

TTS (Text-to-Speech)

Diarize

ImageGen

Common Arguments

Plugin ID Options

Supported Models by Backend

Getting Started