bindings/python/README.md
This directory contains the NexaAI Python SDK and comprehensive examples for various AI inference tasks.
The easiest way to get started with NexaAI is through our interactive Jupyter notebooks. You can find example notebooks in the notebook/ directory.
Each notebook includes:
pip install nexaai -v
If you prefer command-line usage, here are the basic examples:
python llm.py
python vlm.py
python rerank.py
python embedder.py
python cv_ocr.py
python tts.py --text "Hello, world!"
python diarize.py --audio path/to/audio.wav
# Text-to-image
python image_gen.py --prompt "A beautiful sunset over the ocean"
# Image-to-image
python image_gen.py --prompt "A beautiful sunset" --init-image path/to/image.png
--model: Path to the model file--device: Device to run on (cpu, gpu, etc.)--max-tokens: Maximum tokens to generate (for LLM/VLM)--batch-size: Batch size for processing--system: System message for chat models--plugin-id: Plugin ID to use (default: cpu_gpu)The --plugin-id parameter supports different backends:
cpu_gpu: Default, supports both CPU and GPUmetal: Apple Silicon optimized (for supported models)npu: Qualcomm NPU optimized (for supported models)nexaml: NexaML optimized (for supported models)| Backend | Supported Models |
|---|---|
cpu_gpu | GGUF models (default backend) |
metal | Models with MLX format (e.g., Qwen3-VL-4B-MLX-4bit, gpt-oss-20b-MLX-4bit) |
npu | LLM: Granite-4-Micro-NPU, phi4-mini-npu-turbo, Qwen3-4B-Instruct-2507-npu, Qwen3-4B-Thinking-2507-npu, Llama3.2-3B-NPU-Turbo, jan-v1-4B-npu, qwen3-4B-npu, phi3.5-mini-npu |
| VLM: Qwen3-VL-4B-Instruct-NPU, OmniNeural-4B, LFM2-1.2B-npu | |
| Embedder: embeddinggemma-300m-npu | |
| ASR: parakeet-tdt-0.6b-v3-npu | |
| CV: convnext-tiny-npu, paddleocr-npu, yolov12-npu | |
| Reranker: jina-v2-rerank-npu | |
nexaml | VLM: Qwen3-VL-4B-Instruct-GGUF:Q4_0, Qwen3-VL-4B-Thinking-GGUF:Q4_0 |
notebook/ directoryFor detailed setup instructions, please refer to the individual notebooks.