Back to Nexa Sdk

NexaAI Python SDK

bindings/python/README.md

0.2.733.2 KB
Original Source

NexaAI Python SDK

This directory contains the NexaAI Python SDK and comprehensive examples for various AI inference tasks.

Quick Start

The easiest way to get started with NexaAI is through our interactive Jupyter notebooks. You can find example notebooks in the notebook/ directory.

Each notebook includes:

  • LLM (Large Language Model): Text generation and conversation
  • VLM (Vision Language Model): Multimodal understanding and generation
  • Embedder: Text vectorization and similarity computation
  • Reranker: Document reranking
  • ASR (Automatic Speech Recognition): Speech-to-text transcription
  • TTS (Text-to-Speech): Text-to-speech synthesis
  • Diarize: Speaker diarization
  • ImageGen: Image generation from text or image-to-image transformation
  • CV (Computer Vision): OCR/text recognition

Prerequisites

  • Python 3
  • Nexa CLI installed

Installation

bash
pip install nexaai -v

Command Line Examples

If you prefer command-line usage, here are the basic examples:

LLM

bash
python llm.py

Multi-Modal

bash
python vlm.py

Reranker

bash
python rerank.py

Embedder

bash
python embedder.py

Computer Vision

bash
python cv_ocr.py

TTS (Text-to-Speech)

bash
python tts.py --text "Hello, world!"

Diarize

bash
python diarize.py --audio path/to/audio.wav

ImageGen

bash
# Text-to-image
python image_gen.py --prompt "A beautiful sunset over the ocean"

# Image-to-image
python image_gen.py --prompt "A beautiful sunset" --init-image path/to/image.png

Common Arguments

  • --model: Path to the model file
  • --device: Device to run on (cpu, gpu, etc.)
  • --max-tokens: Maximum tokens to generate (for LLM/VLM)
  • --batch-size: Batch size for processing
  • --system: System message for chat models
  • --plugin-id: Plugin ID to use (default: cpu_gpu)

Plugin ID Options

The --plugin-id parameter supports different backends:

  • cpu_gpu: Default, supports both CPU and GPU
  • metal: Apple Silicon optimized (for supported models)
  • npu: Qualcomm NPU optimized (for supported models)
  • nexaml: NexaML optimized (for supported models)

Supported Models by Backend

BackendSupported Models
cpu_gpuGGUF models (default backend)
metalModels with MLX format (e.g., Qwen3-VL-4B-MLX-4bit, gpt-oss-20b-MLX-4bit)
npuLLM: Granite-4-Micro-NPU, phi4-mini-npu-turbo, Qwen3-4B-Instruct-2507-npu, Qwen3-4B-Thinking-2507-npu, Llama3.2-3B-NPU-Turbo, jan-v1-4B-npu, qwen3-4B-npu, phi3.5-mini-npu
VLM: Qwen3-VL-4B-Instruct-NPU, OmniNeural-4B, LFM2-1.2B-npu
Embedder: embeddinggemma-300m-npu
ASR: parakeet-tdt-0.6b-v3-npu
CV: convnext-tiny-npu, paddleocr-npu, yolov12-npu
Reranker: jina-v2-rerank-npu
nexamlVLM: Qwen3-VL-4B-Instruct-GGUF:Q4_0, Qwen3-VL-4B-Thinking-GGUF:Q4_0

Getting Started

  1. Open a notebook from the notebook/ directory
  2. Follow the setup instructions in the notebook
  3. Run the examples step by step to explore different AI capabilities
  4. Customize the examples for your specific use cases

For detailed setup instructions, please refer to the individual notebooks.