docs/providers/inferrs.md
inferrs can serve local models behind an OpenAI-compatible /v1 API. OpenClaw works with inferrs through the generic openai-completions path.
| Property | Value |
|---|---|
| Provider id | inferrs (custom; configure under models.providers.inferrs) |
| Plugin | none — inferrs is not a bundled OpenClaw provider plugin |
| Auth env var | Optional. Any value works if your inferrs server has no auth |
| API | OpenAI-compatible (openai-completions) |
| Suggested base URL | http://127.0.0.1:8080/v1 (or wherever your inferrs server lives) |
This example uses Gemma 4 on a local inferrs server.
{
agents: {
defaults: {
model: { primary: "inferrs/google/gemma-4-E2B-it" },
models: {
"inferrs/google/gemma-4-E2B-it": {
alias: "Gemma 4 (inferrs)",
},
},
},
},
models: {
mode: "merge",
providers: {
inferrs: {
baseUrl: "http://127.0.0.1:8080/v1",
apiKey: "inferrs-local",
api: "openai-completions",
models: [
{
id: "google/gemma-4-E2B-it",
name: "Gemma 4 E2B (inferrs)",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 131072,
maxTokens: 4096,
compat: {
requiresStringContent: true,
},
},
],
},
},
},
}
Inferrs can also be started by OpenClaw only when an inferrs/... model is
selected. Add localService to the same provider entry:
{
models: {
providers: {
inferrs: {
baseUrl: "http://127.0.0.1:8080/v1",
apiKey: "inferrs-local",
api: "openai-completions",
timeoutSeconds: 300,
localService: {
command: "/opt/homebrew/bin/inferrs",
args: [
"serve",
"google/gemma-4-E2B-it",
"--host",
"127.0.0.1",
"--port",
"8080",
"--device",
"metal",
],
healthUrl: "http://127.0.0.1:8080/v1/models",
readyTimeoutMs: 180000,
idleStopMs: 0,
},
models: [
{
id: "google/gemma-4-E2B-it",
name: "Gemma 4 E2B (inferrs)",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 131072,
maxTokens: 4096,
compat: {
requiresStringContent: true,
},
},
],
},
},
},
}
command must be absolute. Use which inferrs on the Gateway host and put that
path in config. For the full field reference, see
Local model services.
<Warning>
If OpenClaw runs fail with an error like:
```text
messages[1].content: invalid type: sequence, expected a string
```
set `compat.requiresStringContent: true` in your model entry.
</Warning>
```json5
compat: {
requiresStringContent: true
}
```
OpenClaw will flatten pure text content parts into plain strings before sending
the request.
If that happens, try this first:
```json5
compat: {
requiresStringContent: true,
supportsTools: false
}
```
That disables OpenClaw's tool schema surface for the model and can reduce prompt
pressure on stricter local backends.
If tiny direct requests still work but normal OpenClaw agent turns continue to
crash inside `inferrs`, the remaining issue is usually upstream model/server
behavior rather than OpenClaw's transport layer.
```bash
curl http://127.0.0.1:8080/v1/chat/completions \
-H 'content-type: application/json' \
-d '{"model":"google/gemma-4-E2B-it","messages":[{"role":"user","content":"What is 2 + 2?"}],"stream":false}'
```
```bash
openclaw infer model run \
--model inferrs/google/gemma-4-E2B-it \
--prompt "What is 2 + 2? Reply with one short sentence." \
--json
```
If the first command works but the second fails, check the troubleshooting section below.
- Native OpenAI-only request shaping does not apply here
- No `service_tier`, no Responses `store`, no prompt-cache hints, and no
OpenAI reasoning-compat payload shaping
- Hidden OpenClaw attribution headers (`originator`, `version`, `User-Agent`)
are not injected on custom `inferrs` base URLs