cookbook/PC/RAG-VLM/README.md
This project is a lightweight Retrieval-Augmented Generation (RAG) system built on top of nexa serve with the Qwen3VL multimodal model.
The system lets you bring your own files — such as PDFs, Word docs, text files, or images — and automatically builds a small database from them. When you ask a question, and the model retrieves relevant chunks from your files and responds based on the resources you provided.
You can run the system directly from the CLI, or launch a simple Gradio UI for an interactive experience.
Before running this project, make sure you have the Nexa SDK installed. Please refer to the Nexa SDK repository for installation instructions.
Once installed, you need to download the Qwen3VL model with the following command:
nexa pull NexaAI/Qwen3-VL-4B-Instruct-GGUF
nexa pull djuna/jina-embeddings-v2-small-en-Q5_K_M-GGUF
After the model is ready, start the Nexa server in a separate terminal:
nexa serve
Then back to this project, create a new conda environment (optional) and install dependencies:
# Create a new conda environment (optional)
conda create -n rag-nexa python=3.10 -y
conda activate rag-nexa
# install python dependencies
pip install gradio
pip install -r requirements.txt
To run the RAG pipeline from the command line:
python rag_nexa.py --data ./docs
./docs folder. Supported formats: .pdf, .txt, .docx, .png, .jpg, .jpeg, .webp, .bmpOnce running, simply type your question in the terminal and the system will answer using your documents.
You can also start an interactive Gradio web UI:
python gradio_ui.py
Open the browser at http://127.0.0.1:7860.
./docs folder (PDFs, docs, text, or images).