docs/providers/ollama.md
OpenClaw integrates with Ollama's native API (/api/chat) for hosted cloud models and local/self-hosted Ollama servers. You can use Ollama in three modes: Cloud + Local through a reachable Ollama host, Cloud only against https://ollama.com, or Local only against a reachable Ollama host.
Ollama provider config uses baseUrl as the canonical key. OpenClaw also accepts baseURL for compatibility with OpenAI SDK-style examples, but new config should prefer baseUrl.
- A provider-level key is sent only to that provider's Ollama host.
- `agents.*.memorySearch.remote.apiKey` is sent only to its remote embedding host.
- A pure `OLLAMA_API_KEY` env value is treated as the Ollama Cloud convention, not sent to local or self-hosted hosts by default.
Choose your preferred setup method and mode.
<Tabs> <Tab title="Onboarding (recommended)"> **Best for:** fastest path to a working Ollama cloud or local setup.<Steps>
<Step title="Run onboarding">
```bash
openclaw onboard
```
Select **Ollama** from the provider list.
</Step>
<Step title="Choose your mode">
- **Cloud + Local** — local Ollama host plus cloud models routed through that host
- **Cloud only** — hosted Ollama models via `https://ollama.com`
- **Local only** — local models only
</Step>
<Step title="Select a model">
`Cloud only` prompts for `OLLAMA_API_KEY` and suggests hosted cloud defaults. `Cloud + Local` and `Local only` ask for an Ollama base URL, discover available models, and auto-pull the selected local model if it is not available yet. When Ollama reports an installed `:latest` tag such as `gemma4:latest`, setup shows that installed model once instead of showing both `gemma4` and `gemma4:latest` or pulling the bare alias again. `Cloud + Local` also checks whether that Ollama host is signed in for cloud access.
</Step>
<Step title="Verify the model is available">
```bash
openclaw models list --provider ollama
```
</Step>
</Steps>
### Non-interactive mode
```bash
openclaw onboard --non-interactive \
--auth-choice ollama \
--accept-risk
```
Optionally specify a custom base URL or model:
```bash
openclaw onboard --non-interactive \
--auth-choice ollama \
--custom-base-url "http://ollama-host:11434" \
--custom-model-id "qwen3.5:27b" \
--accept-risk
```
<Steps>
<Step title="Choose cloud or local">
- **Cloud + Local**: install Ollama, sign in with `ollama signin`, and route cloud requests through that host
- **Cloud only**: use `https://ollama.com` with an `OLLAMA_API_KEY`
- **Local only**: install Ollama from [ollama.com/download](https://ollama.com/download)
</Step>
<Step title="Pull a local model (local only)">
```bash
ollama pull gemma4
# or
ollama pull gpt-oss:20b
# or
ollama pull llama3.3
```
</Step>
<Step title="Enable Ollama for OpenClaw">
For `Cloud only`, use your real `OLLAMA_API_KEY`. For host-backed setups, any placeholder value works:
```bash
# Cloud
export OLLAMA_API_KEY="your-ollama-api-key"
# Local-only
export OLLAMA_API_KEY="ollama-local"
# Or configure in your config file
openclaw config set models.providers.ollama.apiKey "OLLAMA_API_KEY"
```
</Step>
<Step title="Inspect and set your model">
```bash
openclaw models list
openclaw models set ollama/gemma4
```
Or set the default in config:
```json5
{
agents: {
defaults: {
model: { primary: "ollama/gemma4" },
},
},
}
```
</Step>
</Steps>
Use **Cloud + Local** during setup. OpenClaw prompts for the Ollama base URL, discovers local models from that host, and checks whether the host is signed in for cloud access with `ollama signin`. When the host is signed in, OpenClaw also suggests hosted cloud defaults such as `kimi-k2.5:cloud`, `minimax-m2.7:cloud`, and `glm-5.1:cloud`.
If the host is not signed in yet, OpenClaw keeps the setup local-only until you run `ollama signin`.
Use **Cloud only** during setup. OpenClaw prompts for `OLLAMA_API_KEY`, sets `baseUrl: "https://ollama.com"`, and seeds the hosted cloud model list. This path does **not** require a local Ollama server or `ollama signin`.
The cloud model list shown during `openclaw onboard` is populated live from `https://ollama.com/api/tags`, capped at 500 entries, so the picker reflects the current hosted catalog rather than a static seed. If `ollama.com` is unreachable or returns no models at setup time, OpenClaw falls back to the previous hardcoded suggestions so onboarding still completes.
OpenClaw currently suggests `gemma4` as the local default.
When you set OLLAMA_API_KEY (or an auth profile) and do not define models.providers.ollama or another custom remote provider with api: "ollama", OpenClaw discovers models from the local Ollama instance at http://127.0.0.1:11434.
| Behavior | Detail |
|---|---|
| Catalog query | Queries /api/tags |
| Capability detection | Uses best-effort /api/show lookups to read contextWindow, expanded num_ctx Modelfile parameters, and capabilities including vision/tools |
| Vision models | Models with a vision capability reported by /api/show are marked as image-capable (input: ["text", "image"]), so OpenClaw auto-injects images into the prompt |
| Reasoning detection | Uses /api/show capabilities when available, including thinking; falls back to a model-name heuristic (r1, reasoning, think) when Ollama omits capabilities |
| Token limits | Sets maxTokens to the default Ollama max-token cap used by OpenClaw |
| Costs | Sets all costs to 0 |
This avoids manual model entries while keeping the catalog aligned with the local Ollama instance. You can use a full ref such as ollama/<pulled-model>:latest in local infer model run; OpenClaw resolves that installed model from Ollama's live catalog without requiring a hand-written models.json entry.
For signed-in Ollama hosts, some :cloud models may be usable through /api/chat
and /api/show before they appear in /api/tags. When you explicitly select a
full ollama/<model>:cloud ref, OpenClaw validates that exact missing model with
/api/show and adds it to the runtime catalog only if Ollama confirms model
metadata. Typos still fail as unknown models instead of being auto-created.
# See what models are available
ollama list
openclaw models list
For a narrow text-generation smoke test that avoids the full agent tool surface,
use local infer model run with a full Ollama model ref:
OLLAMA_API_KEY=ollama-local \
openclaw infer model run \
--local \
--model ollama/llama3.2:latest \
--prompt "Reply with exactly: pong" \
--json
That path still uses OpenClaw's configured provider, auth, and native Ollama transport, but it does not start a chat-agent turn or load MCP/tool context. If this succeeds while normal agent replies fail, troubleshoot the model's agent prompt/tool capacity next.
For a narrow vision-model smoke test on the same lean path, add one or more
image files to infer model run. This sends the prompt and image directly to
the selected Ollama vision model without loading chat tools, memory, or prior
session context:
OLLAMA_API_KEY=ollama-local \
openclaw infer model run \
--local \
--model ollama/qwen2.5vl:7b \
--prompt "Describe this image in one sentence." \
--file ./photo.jpg \
--json
model run --file accepts files detected as image/*, including common PNG,
JPEG, and WebP inputs. Non-image files are rejected before Ollama is called.
For speech recognition, use openclaw infer audio transcribe instead.
When you switch a conversation with /model ollama/<model>, OpenClaw treats
that as an exact user selection. If the configured Ollama baseUrl is
unreachable, the next reply fails with the provider error instead of silently
answering from another configured fallback model.
Isolated cron jobs do one extra local safety check before they start the agent
turn. If the selected model resolves to a local, private-network, or .local
Ollama provider and /api/tags is unreachable, OpenClaw records that cron run
as skipped with the selected ollama/<model> in the error text. The endpoint
preflight is cached for 5 minutes, so multiple cron jobs pointed at the same
stopped Ollama daemon do not all launch failing model requests.
Live-verify the local text path, native stream path, and embeddings against local Ollama with:
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_OLLAMA=1 OPENCLAW_LIVE_OLLAMA_WEB_SEARCH=0 \
pnpm test:live -- extensions/ollama/ollama.live.test.ts
To add a new model, simply pull it with Ollama:
ollama pull mistral
The new model will be automatically discovered and available to use.
<Note> If you set `models.providers.ollama` explicitly, or configure a custom remote provider such as `models.providers.ollama-cloud` with `api: "ollama"`, auto-discovery is skipped and you must define models manually. Loopback custom providers such as `http://127.0.0.2:11434` are still treated as local. See the explicit config section below. </Note>The bundled Ollama plugin registers Ollama as an image-capable media-understanding provider. This lets OpenClaw route explicit image-description requests and configured image-model defaults through local or hosted Ollama vision models.
For local vision, pull a model that supports images:
ollama pull qwen2.5vl:7b
export OLLAMA_API_KEY="ollama-local"
Then verify with the infer CLI:
openclaw infer image describe \
--file ./photo.jpg \
--model ollama/qwen2.5vl:7b \
--json
--model must be a full <provider/model> ref. When it is set, openclaw infer image describe runs that model directly instead of skipping description because the model supports native vision.
Use infer image describe when you want OpenClaw's image-understanding provider flow, configured agents.defaults.imageModel, and image-description output shape. Use infer model run --file when you want a raw multimodal model probe with a custom prompt and one or more images.
To make Ollama the default image-understanding model for inbound media, configure agents.defaults.imageModel:
{
agents: {
defaults: {
imageModel: {
primary: "ollama/qwen2.5vl:7b",
},
},
},
}
Prefer the full ollama/<model> ref. If the same model is listed under models.providers.ollama.models with input: ["text", "image"] and no other configured image provider exposes that bare model ID, OpenClaw also normalizes a bare imageModel ref such as qwen2.5vl:7b to ollama/qwen2.5vl:7b. If more than one configured image provider has the same bare ID, use the provider prefix explicitly.
Slow local vision models can need a longer image-understanding timeout than cloud models. They can also crash or stop when Ollama tries to allocate the full advertised vision context on constrained hardware. Set a capability timeout, and cap num_ctx on the model entry when you only need a normal image-description turn:
{
models: {
providers: {
ollama: {
models: [
{
id: "qwen2.5vl:7b",
name: "qwen2.5vl:7b",
input: ["text", "image"],
params: { num_ctx: 2048, keep_alive: "1m" },
},
],
},
},
},
tools: {
media: {
image: {
timeoutSeconds: 180,
models: [{ provider: "ollama", model: "qwen2.5vl:7b", timeoutSeconds: 300 }],
},
},
},
}
This timeout applies to inbound image understanding and to the explicit image tool the agent can call during a turn. Provider-level models.providers.ollama.timeoutSeconds still controls the underlying Ollama HTTP request guard for normal model calls.
Live-verify the explicit image tool against local Ollama with:
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_OLLAMA_IMAGE=1 \
pnpm test:live -- src/agents/tools/image-tool.ollama.live.test.ts
If you define models.providers.ollama.models manually, mark vision models with image input support:
{
id: "qwen2.5vl:7b",
name: "qwen2.5vl:7b",
input: ["text", "image"],
contextWindow: 128000,
maxTokens: 8192,
}
OpenClaw rejects image-description requests for models that are not marked image-capable. With implicit discovery, OpenClaw reads this from Ollama when /api/show reports a vision capability.
```bash
export OLLAMA_API_KEY="ollama-local"
```
<Tip>
If `OLLAMA_API_KEY` is set, you can omit `apiKey` in the provider entry and OpenClaw will fill it for availability checks.
</Tip>
```json5
{
models: {
providers: {
ollama: {
baseUrl: "https://ollama.com",
apiKey: "OLLAMA_API_KEY",
api: "ollama",
models: [
{
id: "kimi-k2.5:cloud",
name: "kimi-k2.5:cloud",
reasoning: false,
input: ["text", "image"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 8192
}
]
}
}
}
}
```
```json5
{
models: {
providers: {
ollama: {
apiKey: "ollama-local",
baseUrl: "http://ollama-host:11434", // No /v1 - use native Ollama API URL
api: "ollama", // Set explicitly to guarantee native tool-calling behavior
timeoutSeconds: 300, // Optional: give cold local models longer to connect and stream
models: [
{
id: "qwen3:32b",
name: "qwen3:32b",
params: {
keep_alive: "15m", // Optional: keep the model loaded between turns
},
},
],
},
},
},
}
```
<Warning>
Do not add `/v1` to the URL. The `/v1` path uses OpenAI-compatible mode, where tool calling is not reliable. Use the base Ollama URL without a path suffix.
</Warning>
Use these as starting points and replace model IDs with the exact names from ollama list or openclaw models list --provider ollama.
```bash
ollama serve
ollama pull gemma4
export OLLAMA_API_KEY="ollama-local"
openclaw models list --provider ollama
openclaw models set ollama/gemma4
```
This path keeps config minimal. Do not add a `models.providers.ollama` block unless you want to define models manually.
```json5
{
models: {
providers: {
ollama: {
baseUrl: "http://gpu-box.local:11434",
apiKey: "ollama-local",
api: "ollama",
timeoutSeconds: 300,
contextWindow: 32768,
maxTokens: 8192,
models: [
{
id: "qwen3.5:9b",
name: "qwen3.5:9b",
reasoning: true,
input: ["text"],
params: {
num_ctx: 32768,
thinking: false,
keep_alive: "15m",
},
},
],
},
},
},
agents: {
defaults: {
model: { primary: "ollama/qwen3.5:9b" },
},
},
}
```
`contextWindow` is the OpenClaw-side context budget. `params.num_ctx` is sent to Ollama for the request. Keep them aligned when your hardware cannot run the model's full advertised context.
```bash
export OLLAMA_API_KEY="your-ollama-api-key"
```
```json5
{
models: {
providers: {
ollama: {
baseUrl: "https://ollama.com",
apiKey: "OLLAMA_API_KEY",
api: "ollama",
models: [
{
id: "kimi-k2.5:cloud",
name: "kimi-k2.5:cloud",
reasoning: false,
input: ["text", "image"],
contextWindow: 128000,
maxTokens: 8192,
},
],
},
},
},
agents: {
defaults: {
model: { primary: "ollama/kimi-k2.5:cloud" },
},
},
}
```
```bash
ollama signin
ollama pull gemma4
```
```json5
{
models: {
providers: {
ollama: {
baseUrl: "http://127.0.0.1:11434",
apiKey: "ollama-local",
api: "ollama",
timeoutSeconds: 300,
models: [
{ id: "gemma4", name: "gemma4", input: ["text"] },
{ id: "kimi-k2.5:cloud", name: "kimi-k2.5:cloud", input: ["text", "image"] },
],
},
},
},
agents: {
defaults: {
model: {
primary: "ollama/gemma4",
fallbacks: ["ollama/kimi-k2.5:cloud"],
},
},
},
}
```
```json5
{
models: {
providers: {
"ollama-fast": {
baseUrl: "http://mini.local:11434",
apiKey: "ollama-local",
api: "ollama",
contextWindow: 32768,
models: [{ id: "gemma4", name: "gemma4", input: ["text"] }],
},
"ollama-large": {
baseUrl: "http://gpu-box.local:11434",
apiKey: "ollama-local",
api: "ollama",
timeoutSeconds: 420,
contextWindow: 131072,
maxTokens: 16384,
models: [{ id: "qwen3.5:27b", name: "qwen3.5:27b", input: ["text"] }],
},
},
},
agents: {
defaults: {
model: {
primary: "ollama-fast/gemma4",
fallbacks: ["ollama-large/qwen3.5:27b"],
},
},
},
}
```
When OpenClaw sends the request, the active provider prefix is stripped so `ollama-large/qwen3.5:27b` reaches Ollama as `qwen3.5:27b`.
```json5
{
agents: {
defaults: {
experimental: {
localModelLean: true,
},
model: { primary: "ollama/gemma4" },
},
},
models: {
providers: {
ollama: {
baseUrl: "http://127.0.0.1:11434",
apiKey: "ollama-local",
api: "ollama",
contextWindow: 32768,
models: [
{
id: "gemma4",
name: "gemma4",
input: ["text"],
params: { num_ctx: 32768 },
compat: { supportsTools: false },
},
],
},
},
},
}
```
Use `compat.supportsTools: false` only when the model or server reliably fails on tool schemas. It trades agent capability for stability.
`localModelLean` removes the browser, cron, and message tools from the agent surface, but it does not change Ollama's runtime context or thinking mode. Pair it with explicit `params.num_ctx` and `params.thinking: false` for small Qwen-style thinking models that loop or spend their response budget on hidden reasoning.
Once configured, all your Ollama models are available:
{
agents: {
defaults: {
model: {
primary: "ollama/gpt-oss:20b",
fallbacks: ["ollama/llama3.3", "ollama/qwen2.5-coder:32b"],
},
},
},
}
Custom Ollama provider ids are also supported. When a model ref uses the active
provider prefix, such as ollama-spark/qwen3:32b, OpenClaw strips only that
prefix before calling Ollama so the server receives qwen3:32b.
For slow local models, prefer provider-scoped request tuning before raising the whole agent runtime timeout:
{
models: {
providers: {
ollama: {
timeoutSeconds: 300,
models: [
{
id: "gemma4:26b",
name: "gemma4:26b",
params: { keep_alive: "15m" },
},
],
},
},
},
}
timeoutSeconds applies to the model HTTP request, including connection setup,
headers, body streaming, and the total guarded-fetch abort. params.keep_alive
is forwarded to Ollama as top-level keep_alive on native /api/chat requests;
set it per model when first-turn load time is the bottleneck.
# Ollama daemon visible to this machine
curl http://127.0.0.1:11434/api/tags
# OpenClaw catalog and selected model
openclaw models list --provider ollama
openclaw models status
# Direct model smoke
openclaw infer model run \
--model ollama/gemma4 \
--prompt "Reply with exactly: ok"
For remote hosts, replace 127.0.0.1 with the host used in baseUrl. If curl works but OpenClaw does not, check whether the Gateway runs on a different machine, container, or service account.
OpenClaw supports Ollama Web Search as a bundled web_search provider.
| Property | Detail |
|---|---|
| Host | Uses your configured Ollama host (models.providers.ollama.baseUrl when set, otherwise http://127.0.0.1:11434); https://ollama.com uses the hosted API directly |
| Auth | Key-free for signed-in local Ollama hosts; OLLAMA_API_KEY or configured provider auth for direct https://ollama.com search or auth-protected hosts |
| Requirement | Local/self-hosted hosts must be running and signed in with ollama signin; direct hosted search requires baseUrl: "https://ollama.com" plus a real Ollama API key |
Choose Ollama Web Search during openclaw onboard or openclaw configure --section web, or set:
{
tools: {
web: {
search: {
provider: "ollama",
},
},
},
}
For direct hosted search through Ollama Cloud:
{
models: {
providers: {
ollama: {
baseUrl: "https://ollama.com",
apiKey: "OLLAMA_API_KEY",
api: "ollama",
models: [{ id: "kimi-k2.5:cloud", name: "kimi-k2.5:cloud", input: ["text"] }],
},
},
},
tools: {
web: {
search: { provider: "ollama" },
},
},
}
For a signed-in local daemon, OpenClaw uses the daemon's /api/experimental/web_search proxy. For https://ollama.com, it calls the hosted /api/web_search endpoint directly.
If you need to use the OpenAI-compatible endpoint instead (for example, behind a proxy that only supports OpenAI format), set `api: "openai-completions"` explicitly:
```json5
{
models: {
providers: {
ollama: {
baseUrl: "http://ollama-host:11434/v1",
api: "openai-completions",
injectNumCtxForOpenAICompat: true, // default: true
apiKey: "ollama-local",
models: [...]
}
}
}
}
```
This mode may not support streaming and tool calling simultaneously. You may need to disable streaming with `params: { streaming: false }` in model config.
When `api: "openai-completions"` is used with Ollama, OpenClaw injects `options.num_ctx` by default so Ollama does not silently fall back to a 4096 context window. If your proxy/upstream rejects unknown `options` fields, disable this behavior:
```json5
{
models: {
providers: {
ollama: {
baseUrl: "http://ollama-host:11434/v1",
api: "openai-completions",
injectNumCtxForOpenAICompat: false,
apiKey: "ollama-local",
models: [...]
}
}
}
}
```
You can set provider-level `contextWindow`, `contextTokens`, and `maxTokens` defaults for every model under that Ollama provider, then override them per model when needed. `contextWindow` is OpenClaw's prompt and compaction budget. Native Ollama requests leave `options.num_ctx` unset unless you explicitly configure `params.num_ctx`, so Ollama can apply its own model, `OLLAMA_CONTEXT_LENGTH`, or VRAM-based default. To cap or force Ollama's per-request runtime context without rebuilding a Modelfile, set `params.num_ctx`; invalid, zero, negative, and non-finite values are ignored. The OpenAI-compatible Ollama adapter still injects `options.num_ctx` by default from the configured `params.num_ctx` or `contextWindow`; disable that with `injectNumCtxForOpenAICompat: false` if your upstream rejects `options`.
Native Ollama model entries also accept the common Ollama runtime options under `params`, including `temperature`, `top_p`, `top_k`, `min_p`, `num_predict`, `stop`, `repeat_penalty`, `num_batch`, `num_thread`, and `use_mmap`. OpenClaw forwards only Ollama request keys, so OpenClaw runtime params such as `streaming` are not leaked to Ollama. Use `params.think` or `params.thinking` to send top-level Ollama `think`; `false` disables API-level thinking for Qwen-style thinking models.
```json5
{
models: {
providers: {
ollama: {
contextWindow: 32768,
models: [
{
id: "llama3.3",
contextWindow: 131072,
maxTokens: 65536,
params: {
num_ctx: 32768,
temperature: 0.7,
top_p: 0.9,
thinking: false,
},
}
]
}
}
}
}
```
Per-model `agents.defaults.models["ollama/<model>"].params.num_ctx` works too. If both are configured, the explicit provider model entry wins over the agent default.
```bash
openclaw agent --model ollama/gemma4 --thinking off
openclaw agent --model ollama/gemma4 --thinking low
```
You can also set a model default:
```json5
{
agents: {
defaults: {
models: {
"ollama/gemma4": {
thinking: "low",
},
},
},
},
}
```
Per-model `params.think` or `params.thinking` can disable or force Ollama API thinking for a specific configured model. OpenClaw preserves those explicit model params when the active run only has the implicit default `off`; non-off runtime commands such as `/think medium` still override the active run.
```bash
ollama pull deepseek-r1:32b
```
No additional configuration is needed. OpenClaw marks them automatically.
| Property | Value |
| ------------- | ------------------- |
| Default model | `nomic-embed-text` |
| Auto-pull | Yes — the embedding model is pulled automatically if not present locally |
Query-time embeddings use retrieval prefixes for models that require or recommend them, including `nomic-embed-text`, `qwen3-embedding`, and `mxbai-embed-large`. Memory document batches stay raw so existing indexes do not need a format migration.
To select Ollama as the memory search embedding provider:
```json5
{
agents: {
defaults: {
memorySearch: {
provider: "ollama",
remote: {
// Default for Ollama. Raise on larger hosts if reindexing is too slow.
nonBatchConcurrency: 1,
},
},
},
},
}
```
For a remote embedding host, keep auth scoped to that host:
```json5
{
agents: {
defaults: {
memorySearch: {
provider: "ollama",
model: "nomic-embed-text",
remote: {
baseUrl: "http://gpu-box.local:11434",
apiKey: "ollama-local",
nonBatchConcurrency: 2,
},
},
},
},
}
```
For native `/api/chat` requests, OpenClaw also forwards thinking control directly to Ollama: `/think off` and `openclaw agent --thinking off` send top-level `think: false` unless an explicit model `params.think`/`params.thinking` value is configured, while `/think low|medium|high` send the matching top-level `think` effort string. `/think max` maps to Ollama's highest native effort, `think: "high"`.
<Tip>
If you need to use the OpenAI-compatible endpoint, see the "Legacy OpenAI-compatible mode" section above. Streaming and tool calling may not work simultaneously in that mode.
</Tip>
Common evidence:
- repeated WSL2 reboots or terminations from the Windows side
- high CPU in `app.slice` or `ollama.service` shortly after WSL2 startup
- SIGTERM from systemd rather than a Linux OOM-killer event
OpenClaw logs a startup warning when it detects WSL2, `ollama.service` enabled with `Restart=always`, and visible CUDA markers.
Mitigation:
```bash
sudo systemctl disable ollama
```
Add this to `%USERPROFILE%\.wslconfig` on the Windows side, then run `wsl --shutdown`:
```ini
[experimental]
autoMemoryReclaim=disabled
```
Set a shorter keep-alive in the Ollama service environment, or start Ollama manually only when you need it:
```bash
export OLLAMA_KEEP_ALIVE=5m
ollama serve
```
See [ollama/ollama#11317](https://github.com/ollama/ollama/issues/11317).
```bash
ollama serve
```
Verify that the API is accessible:
```bash
curl http://localhost:11434/api/tags
```
```bash
ollama list # See what's installed
ollama pull gemma4
ollama pull gpt-oss:20b
ollama pull llama3.3 # Or another model
```
```bash
# Check if Ollama is running
ps aux | grep ollama
# Or restart Ollama
ollama serve
```
```bash
openclaw gateway status --deep
curl http://ollama-host:11434/api/tags
```
Common causes:
- `baseUrl` points at `localhost`, but the Gateway runs in Docker or on another host.
- The URL uses `/v1`, which selects OpenAI-compatible behavior instead of native Ollama.
- The remote host needs firewall or LAN binding changes on the Ollama side.
- The model is present on your laptop's daemon but not on the remote daemon.
Prefer native Ollama mode:
```json5
{
models: {
providers: {
ollama: {
baseUrl: "http://ollama-host:11434",
api: "ollama",
},
},
},
}
```
If a small local model still fails on tool schemas, set `compat.supportsTools: false` on that model entry and retest.
If it happens repeatedly, capture the raw model name, the current session file, and whether the run used `Cloud + Local` or `Cloud only`, then try a fresh session and a fallback model:
```bash
openclaw infer model run --model ollama/kimi-k2.5:cloud --prompt "Reply with exactly: ok" --json
openclaw models set ollama/gemma4
```
```json5
{
models: {
providers: {
ollama: {
timeoutSeconds: 300,
models: [
{
id: "gemma4:26b",
name: "gemma4:26b",
params: { keep_alive: "15m" },
},
],
},
},
},
}
```
If the host itself is slow to accept connections, `timeoutSeconds` also extends the guarded Undici connect timeout for this provider.
```json5
{
models: {
providers: {
ollama: {
contextWindow: 32768,
maxTokens: 8192,
models: [
{
id: "qwen3.5:9b",
name: "qwen3.5:9b",
params: { num_ctx: 32768, thinking: false },
},
],
},
},
},
}
```
Lower `contextWindow` first if OpenClaw is sending too much prompt. Lower `params.num_ctx` if Ollama is loading a runtime context that is too large for the machine. Lower `maxTokens` if generation runs too long.