cookbook/PC/Web-Agent-Qwen3VL/README.md
This demo integrates Nexa SDK with Web-UI to enable local multimodal LLM-driven browser automation. The agent can interact with websites, perform searches, and execute complex web tasks autonomously using the NexaAI/Qwen3-VL-4B-Instruct-GGUF model (recommended).
Built upon @browser-use/web-ui, this integration demonstrates the power of local vision-language models for web automation tasks.
git clone https://github.com/mtilyxuegao/Nexa-Web-UI.git
cd Nexa-Web-UI
Download and install the Nexa SDK package from the official GitHub repository:
Visit https://github.com/NexaAI/nexa-sdk/releases/tag/v0.2.49 to download the appropriate installer for your platform and install it.
Choose one of the following methods to set up your Python environment:
# Navigate to web-ui directory
cd web-ui
# Create virtual environment with uv
uv venv --python 3.11
# Activate virtual environment
source .venv/bin/activate # macOS/Linux
# or .\.venv\Scripts\Activate.ps1 # Windows PowerShell
# Install Python dependencies
uv pip install -r requirements.txt
# (Optional) Install memory features for enhanced agent learning capabilities
# This adds ~110MB of ML dependencies (torch, transformers, etc.)
uv pip install "browser-use[memory]"
# Install Playwright browsers (recommend Chromium only)
playwright install chromium --with-deps
# Navigate to web-ui directory
cd web-ui
# Create conda environment
conda create -n nexa-webui python=3.11 -y
conda activate nexa-webui
# Install Python dependencies
pip install -r requirements.txt
# (Optional) Install memory features for enhanced agent learning capabilities
# This adds ~110MB of ML dependencies (torch, transformers, etc.)
pip install "browser-use[memory]"
# Install Playwright browsers (recommend Chromium only)
playwright install chromium --with-deps
The project includes a preconfigured web-ui/.env file with the following main settings:
# LLM Provider Settings
DEFAULT_LLM=nexa
NEXA_ENDPOINT=http://127.0.0.1:8080/v1
# Other API Keys (if using other LLMs)
# OPENAI_API_KEY=your_openai_key
# ANTHROPIC_API_KEY=your_anthropic_key
Download the multimodal VLM model:
# Download the model
nexa pull NexaAI/Qwen3-VL-4B-Instruct-GGUF
Optional: If you want to set up Hugging Face token for accessing other private models:
bashexport HUGGINGFACE_HUB_TOKEN="your_huggingface_token" export NEXA_HFTOKEN="your_huggingface_token"
Important Notes:
Before starting, ensure ports are clean:
# Kill all related processes
lsof -ti:8080,7788 | xargs kill -9 2>/dev/null
pkill -f "nexa serve"
pkill -f "webui.py"
# Navigate to project root directory
cd Nexa-Web-UI
nexa serve --host 127.0.0.1:8080 --keepalive 600
Wait until you see the message: Localhosting on http://127.0.0.1:8080/docs/ui
In a new terminal window:
# Navigate to project root directory
cd Nexa-Web-UI
# Activate your Python environment
source web-ui/.venv/bin/activate # or conda activate nexa-webui
# Start the web interface
python web-ui/webui.py --ip 127.0.0.1 --port 7788
Wait until you see the message: Running on local URL: http://127.0.0.1:7788
Open your browser and visit: http://127.0.0.1:7788
Follow these steps to run the agent:
nexaNexaAI/Qwen3-VL-4B-Instruct-GGUFTask: Go to google.com, search for 'nexa ai', and click the first element
Expected Behavior:
The agent will automatically execute these steps using vision-language understanding to interact with the webpage.
If you encounter port conflicts, use the cleanup command:
lsof -ti:8080,7788 | xargs kill -9 2>/dev/null
HUGGINGFACE_HUB_TOKEN and NEXA_HFTOKEN are setIf Playwright browsers fail to install:
playwright install chromium --with-deps --force
We would like to officially thank the browser-use/web-ui project and its contributors for providing the foundation that makes this integration possible.