docs/guides/ollama-guide.mdx
Before getting started, ensure your system meets these requirements:
Choose the installation method for your operating system:
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Windows
# Download from ollama.ai
After installation, start the Ollama service:
# Check Ollama version - verify it's installed
ollama --version
# Start Ollama (runs in background)
ollama serve
# Verify it's running
curl http://localhost:11434
# Should return "Ollama is running"
Download models using the exact tag specified:
# Pull models with specific tags
ollama pull deepseek-r1:32b # 32B parameter version
ollama pull deepseek-r1:latest # Latest/default version
ollama pull mistral:latest
ollama pull qwen2.5-coder:1.5b
# List all downloaded models
ollama list
Common Model Tags:
:latest - Default version (used if no tag specified):32b, :7b, :1.5b - Parameter count versions:instruct, :base - Model variantsThere are multiple ways to configure Ollama models in Continue:
The easiest way is to use pre-configured model blocks from the Continue Mission Control in your local configuration:
name: My Local Config
version: 0.0.1
schema: v1
models:
- uses: ollama/deepseek-r1-32b
- uses: ollama/qwen2.5-coder-7b
- uses: ollama/gpt-oss-20b
Continue can automatically detect available Ollama models. You can configure this in your YAML:
models:
- name: Autodetect
provider: ollama
model: AUTODETECT
roles:
- chat
- edit
- apply
- rerank
- autocomplete
Or use it through the GUI:
You can update apiBase with the IP address of a remote machine serving Ollama.
For custom configurations or models not in Mission Control:
models:
- name: DeepSeek R1 32B
provider: ollama
model: deepseek-r1:32b # Must match exactly what `ollama list` shows
apiBase: http://localhost:11434
roles:
- chat
- edit
capabilities: # Add if not auto-detected
- tool_use
- name: Qwen2.5-Coder 1.5B
provider: ollama
model: qwen2.5-coder:1.5b
roles:
- autocomplete
Some Ollama models support tools (function calling) which is required for Agent mode. However, not all models that claim tool support work correctly:
models:
- name: DeepSeek R1
provider: ollama
model: deepseek-r1:latest
capabilities:
- tool_use # Add this to enable tools
capabilities: [tool_use] to your model configSee the Model Capabilities guide for more details.
For optimal performance, consider these advanced configuration options:
models:
- name: Optimized DeepSeek
provider: ollama
model: deepseek-r1:32b
contextLength: 8192 # Adjust context window (default varies by model)
completionOptions:
temperature: 0.7 # Controls randomness (0.0-1.0)
top_p: 0.9 # Nucleus sampling threshold
top_k: 40 # Top-k sampling
num_predict: 2048 # Max tokens to generate
# Ollama-specific options (set via environment or modelfile)
# num_gpu: 35 # Number of GPU layers to offload
# num_thread: 8 # CPU threads to use
For GPU acceleration and memory tuning, create an Ollama Modelfile:
# Create custom model with optimizations
FROM deepseek-r1:32b
PARAMETER num_gpu 35
PARAMETER num_thread 8
PARAMETER num_ctx 4096
Choose models based on your specific needs (see recommended models for more options):
Code Generation:
qwen2.5-coder:7b - Excellent for code completioncodellama:13b - Strong general coding supportdeepseek-coder:6.7b - Fast and efficientChat & Reasoning:
llama3.1:8b - Latest Llama with tool supportmistral:7b - Fast and versatiledeepseek-r1:32b - Advanced reasoning capabilitiesAutocomplete:
qwen2.5-coder:1.5b - Lightweight and faststarcoder2:3b - Optimized for code completionMemory Requirements:
To get the best performance from Ollama:
ollama ps to see memory usageollama logs to debug performance issuesThis error occurs when the model isn't installed locally:
Problem: Using a hub block or config that references a model not yet pulled Solution:
# Check what models you have
ollama list
# Pull the exact model version needed
ollama pull model-name:tag # e.g., deepseek-r1:32b
Problem: ollama pull deepseek-r1 installs :latest but hub block expects :32b
Solution: Always pull with the exact tag:
# Wrong - pulls :latest
ollama pull deepseek-r1
# Right - pulls specific version
ollama pull deepseek-r1:32b
Problem: Model doesn't support tools/function calling Solutions:
capabilities: [tool_use] to your model configProblem: Unclear how to use hub models locally Solution: Create a local agent file:
# ~/.continue/configs/config.yaml
name: Local Config
version: 0.0.1
schema: v1
models:
- uses: ollama/model-name
curl http://localhost:11434systemctl status ollama (Linux)OLLAMA_HOST=0.0.0.0:11434ollama psnum_gpu layers in model configurationollama ps for active models and memory usage# Example: Generate a FastAPI endpoint
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app = FastAPI()
class User(BaseModel):
name: str
email: str
age: int
@app.post("/users/")
async def create_user(user: User):
# Continue will help complete this implementation
# Use Cmd+I (Mac) or Ctrl+I (Windows/Linux) to generate code
pass
Use Continue with Ollama to:
Ollama with Continue provides a powerful local development environment for AI-assisted coding. You now have complete control over your AI models, ensuring privacy and enabling offline development workflows.
This guide is based on Ollama v0.11.x and Continue v1.1.x. Please check for updates regularly.