backend/docs/ollama.md
The Ollama provider enables PentAGI to use local language models through the Ollama server.
http://localhost:11434)ollama pull gemma3:1bConfigure the Ollama provider using environment variables:
# Ollama server URL (default: http://localhost:11434)
OLLAMA_SERVER_URL=http://localhost:11434
# Default model for inference (optional, default: llama3.1:8b-instruct-q8_0)
OLLAMA_SERVER_MODEL=llama3.1:8b-instruct-q8_0
# Path to custom config file (optional)
OLLAMA_SERVER_CONFIG_PATH=/path/to/ollama_config.yml
# Model management settings (optional)
OLLAMA_SERVER_PULL_MODELS_TIMEOUT=600 # Timeout for model downloads in seconds
OLLAMA_SERVER_PULL_MODELS_ENABLED=false # Auto-download models on startup
OLLAMA_SERVER_LOAD_MODELS_ENABLED=false # Load model list from server
# Proxy URL if needed
PROXY_URL=http://proxy:8080
Control how PentAGI interacts with your Ollama server:
Model Management:
OLLAMA_SERVER_PULL_MODELS_ENABLED=true): Automatically downloads models specified in config file on startupOLLAMA_SERVER_PULL_MODELS_TIMEOUT): Maximum time to wait for model downloads (default: 600 seconds)OLLAMA_SERVER_LOAD_MODELS_ENABLED=true): Queries Ollama server for available models via APIPerformance Note: Enabling OLLAMA_SERVER_LOAD_MODELS_ENABLED adds startup latency as PentAGI queries the Ollama API. Disable if you only need specific models from config file.
Recommended Settings:
# Fast startup (static config)
OLLAMA_SERVER_MODEL=llama3.1:8b-instruct-q8_0
OLLAMA_SERVER_PULL_MODELS_ENABLED=false
OLLAMA_SERVER_LOAD_MODELS_ENABLED=false
# Auto-discovery (dynamic config)
OLLAMA_SERVER_PULL_MODELS_ENABLED=true
OLLAMA_SERVER_PULL_MODELS_TIMEOUT=900
OLLAMA_SERVER_LOAD_MODELS_ENABLED=true
The provider dynamically loads models from your local Ollama server. Available models depend on what you have installed locally.
Popular model families include:
gemma3:1b, gemma3:2b, gemma3:7b, gemma3:27bllama3.1:7b, llama3.1:8b, llama3.1:8b-instruct-q8_0, llama3.1:8b-instruct-fp16, llama3.1:70b, llama3.2:1b, llama3.2:3b, llama3.2:90bqwen2.5:1.5b, qwen2.5:3b, qwen2.5:7b, qwen2.5:14b, qwen2.5:32b, qwen2.5:72bdeepseek-r1:1.5b, deepseek-r1:7b, deepseek-r1:8b, deepseek-r1:14b, deepseek-r1:32bnomic-embed-textTo see available models on your system: ollama list
To download new models: ollama pull <model-name>
The provider supports all PentAGI agent types with optimized configurations:
simple: General purpose chat (temperature: 0.2)assistant: AI assistant tasks (temperature: 0.2)coder: Code generation (temperature: 0.1, max tokens: 6000)pentester: Security testing (temperature: 0.3, max tokens: 8000)generator: Content generation (temperature: 0.4)refiner: Content refinement (temperature: 0.3)searcher: Information searching (temperature: 0.2, max tokens: 3000)Create a custom config file to override default settings:
simple:
model: "llama3.1:8b-instruct-q8_0"
temperature: 0.2
top_p: 0.3
n: 1
max_tokens: 4000
coder:
model: "deepseek-r1:8b"
temperature: 0.1
top_p: 0.2
n: 1
max_tokens: 8000
Then set OLLAMA_SERVER_CONFIG_PATH to the file path.
Ollama provides free local inference - no usage costs or API limits.
# Set environment variables
export OLLAMA_SERVER_URL=http://localhost:11434
# Start PentAGI with Ollama provider
./pentagi
ollama pull <model-name>