Documentation/installation_guide.md
Last updated: 2025-01-07
This guide provides step-by-step instructions for installing and setting up the RAG system using either Docker or direct development approaches.
Required for both approaches:
Docker-specific:
Direct Development-specific:
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Verify installation
ollama --version
# Download from: https://ollama.ai/download
# Run the installer and follow setup wizard
# Start Ollama server
ollama serve
# In another terminal, install required models
ollama pull qwen3:0.6b # Fast model (650MB)
ollama pull qwen3:8b # High-quality model (4.7GB)
# Verify models are installed
ollama list
# Test Ollama
ollama run qwen3:0.6b "Hello, how are you?"
⚠️ Important: Keep Ollama running (ollama serve) for the entire setup process.
# Install Docker Desktop via Homebrew
brew install --cask docker
# Or download from: https://www.docker.com/products/docker-desktop/
# Start Docker Desktop from Applications
# Verify installation
docker --version
docker compose version
# Update system
sudo apt-get update
# Install Docker using convenience script
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker
# Install Docker Compose V2
sudo apt-get install docker-compose-plugin
# Verify installation
docker --version
docker compose version
docker --version# Clone repository
git clone <your-repository-url>
cd rag_system_old
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Start Docker containers
./start-docker.sh
# Wait for containers to start (2-3 minutes)
sleep 120
# Verify deployment
./start-docker.sh status
# Test all endpoints
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
# Access the application
open http://localhost:3000
# Clone repository
git clone https://github.com/your-org/rag-system.git
cd rag-system
# Create virtual environment (recommended)
python -m venv venv
# Activate virtual environment
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Install Python dependencies
pip install -r requirements.txt
# Verify Python setup
python -c "import torch; print('✅ PyTorch OK')"
python -c "import transformers; print('✅ Transformers OK')"
python -c "import lancedb; print('✅ LanceDB OK')"
# Install Node.js dependencies
npm install
# Verify Node.js setup
node --version # Should be 16+
npm --version
npm list --depth=0
# Ensure Ollama is running
curl http://localhost:11434/api/tags
# Start all components with one command
python run_system.py
# Or start components manually in separate terminals:
# Terminal 1: python -m rag_system.api_server
# Terminal 2: cd backend && python server.py
# Terminal 3: npm run dev
# Check system health
python system_health_check.py
# Test endpoints
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
# Access the application
open http://localhost:3000
# Clone repository
git clone https://github.com/your-org/rag-system.git
cd rag-system
# Check repository structure
ls -la
# Create required directories
mkdir -p lancedb index_store shared_uploads logs backend
touch backend/chat_data.db
# Set permissions
chmod -R 755 lancedb index_store shared_uploads
chmod 664 backend/chat_data.db
For Docker (automatic via docker.env):
OLLAMA_HOST=http://host.docker.internal:11434
NODE_ENV=production
RAG_API_URL=http://rag-api:8001
NEXT_PUBLIC_API_URL=http://localhost:8000
For Direct Development (set automatically by run_system.py):
OLLAMA_HOST=http://localhost:11434
RAG_API_URL=http://localhost:8001
NEXT_PUBLIC_API_URL=http://localhost:8000
The system defaults to these models:
Qwen/Qwen3-Embedding-0.6B (1024 dimensions)qwen3:0.6b for fast responses, qwen3:8b for quality# Initialize SQLite database
python -c "
from backend.database import ChatDatabase
db = ChatDatabase()
db.init_database()
print('✅ Database initialized')
"
# Verify database
sqlite3 backend/chat_data.db ".tables"
# For Docker deployment
./start-docker.sh status
docker compose ps
# For Direct development
python system_health_check.py
# Universal health check
curl -f http://localhost:3000 && echo "✅ Frontend OK"
curl -f http://localhost:8000/health && echo "✅ Backend OK"
curl -f http://localhost:8001/models && echo "✅ RAG API OK"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama OK"
# Test RAG system initialization
python -c "
from rag_system.main import get_agent
agent = get_agent('default')
print('✅ RAG System initialized successfully')
"
# Test embedding generation
python -c "
from rag_system.main import get_agent
agent = get_agent('default')
embedder = agent.retrieval_pipeline._get_text_embedder()
test_emb = embedder.create_embeddings(['Hello world'])
print(f'✅ Embedding generated: {test_emb.shape}')
"
# Test session creation
curl -X POST http://localhost:8000/sessions \
-H "Content-Type: application/json" \
-d '{"title": "Test Session"}'
# Test models endpoint
curl http://localhost:8001/models
# Test health endpoints
curl http://localhost:8000/health
curl http://localhost:8001/health
# Ollama not responding
curl http://localhost:11434/api/tags
# If fails, restart Ollama
pkill ollama
ollama serve
# Reinstall models if needed
ollama pull qwen3:0.6b
ollama pull qwen3:8b
# Docker daemon not running
docker version
# Restart Docker Desktop (macOS/Windows)
# Or restart docker service (Linux)
sudo systemctl restart docker
# Clear Docker cache if build fails
docker system prune -f
# Check Python version
python --version # Should be 3.8+
# Check virtual environment
which python
pip list | grep torch
# Reinstall dependencies
pip install -r requirements.txt --force-reinstall
# Check Node version
node --version # Should be 16+
# Clear and reinstall
rm -rf node_modules package-lock.json
npm install
# Check system memory
free -h # Linux
vm_stat # macOS
# For Docker: Increase memory allocation
# Docker Desktop → Settings → Resources → Memory → 8GB+
# Use smaller models
ollama pull qwen3:0.6b # Instead of qwen3:8b
lancedb/, shared_uploads/)# Install additional models (optional)
ollama pull nomic-embed-text # Alternative embedding model
ollama pull llama3.1:8b # Alternative generation model
# Test model switching
curl -X POST http://localhost:8001/chat \
-H "Content-Type: application/json" \
-d '{"query": "Hello", "model": "qwen3:8b"}'
# Set proper file permissions
chmod 600 backend/chat_data.db # Restrict database access
chmod 700 lancedb/ # Restrict vector DB access
# Configure firewall (production)
sudo ufw allow 3000/tcp # Frontend
sudo ufw deny 8000/tcp # Backend (internal only)
sudo ufw deny 8001/tcp # RAG API (internal only)
# Create backup script
cat > backup_system.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
# Backup databases and indexes
cp -r backend/chat_data.db "$BACKUP_DIR/"
cp -r lancedb "$BACKUP_DIR/"
cp -r index_store "$BACKUP_DIR/"
cp -r shared_uploads "$BACKUP_DIR/"
echo "Backup completed: $BACKUP_DIR"
EOF
chmod +x backup_system.sh
Acceptable Performance:
Optimal Performance:
Documentation/quick_start.mdDocumentation/docker_usage.mdDocumentation/architecture_overview.mdDocumentation/api_reference.mdCongratulations! 🎉 Your RAG system is now ready to use. Visit http://localhost:3000 to start chatting with your documents.