docs/providers/ds4.md
ds4 serves DeepSeek V4 Flash from a local
Metal backend with an OpenAI-compatible /v1 API. OpenClaw connects to ds4
through the generic openai-completions provider family.
ds4 is not a bundled OpenClaw provider plugin. Configure it under
models.providers.ds4, then select ds4/deepseek-v4-flash.
ds4openai-completions)http://127.0.0.1:18000/v1deepseek-v4-flashtools and tool_callsthinking and reasoning_effortds4-server and the DeepSeek V4 Flash GGUF file.--ctx values allocate more
KV memory when the server starts.```bash
<DS4_DIR>/ds4-server \
--model <DS4_DIR>/ds4flash.gguf \
--host 127.0.0.1 \
--port 18000 \
--ctx 32768 \
--tokens 128
```
The response should include `deepseek-v4-flash`.
```bash
openclaw infer model run \
--local \
--model ds4/deepseek-v4-flash \
--thinking off \
--prompt "Reply with exactly: openclaw-ds4-ok" \
--json
```
Use this config when ds4 is already running on 127.0.0.1:18000.
{
agents: {
defaults: {
model: { primary: "ds4/deepseek-v4-flash" },
models: {
"ds4/deepseek-v4-flash": {
alias: "DS4 local",
},
},
},
},
models: {
mode: "merge",
providers: {
ds4: {
baseUrl: "http://127.0.0.1:18000/v1",
apiKey: "ds4-local",
api: "openai-completions",
timeoutSeconds: 300,
models: [
{
id: "deepseek-v4-flash",
name: "DeepSeek V4 Flash (ds4)",
reasoning: true,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 32768,
maxTokens: 128,
compat: {
supportsUsageInStreaming: true,
supportsReasoningEffort: true,
maxTokensField: "max_tokens",
supportsStrictMode: false,
thinkingFormat: "deepseek",
supportedReasoningEfforts: ["low", "medium", "high", "xhigh"],
},
},
],
},
},
},
}
Keep contextWindow aligned with the ds4-server --ctx value. Keep maxTokens
aligned with --tokens unless you intentionally want OpenClaw to request less
output than the server default.
OpenClaw can start ds4 only when a ds4/... model is selected. Add
localService to the same provider entry:
{
models: {
providers: {
ds4: {
baseUrl: "http://127.0.0.1:18000/v1",
apiKey: "ds4-local",
api: "openai-completions",
timeoutSeconds: 300,
localService: {
command: "<DS4_DIR>/ds4-server",
args: [
"--model",
"<DS4_DIR>/ds4flash.gguf",
"--host",
"127.0.0.1",
"--port",
"18000",
"--ctx",
"32768",
"--tokens",
"128",
],
cwd: "<DS4_DIR>",
healthUrl: "http://127.0.0.1:18000/v1/models",
readyTimeoutMs: 300000,
idleStopMs: 0,
},
models: [
{
id: "deepseek-v4-flash",
name: "DeepSeek V4 Flash (ds4)",
reasoning: true,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 32768,
maxTokens: 128,
compat: {
supportsUsageInStreaming: true,
supportsReasoningEffort: true,
maxTokensField: "max_tokens",
supportsStrictMode: false,
thinkingFormat: "deepseek",
supportedReasoningEfforts: ["low", "medium", "high", "xhigh"],
},
},
],
},
},
},
}
command must be an absolute executable path. Shell lookup and ~ expansion are
not used. See Local model services for every
localService field.
ds4 applies Think Max only when both conditions are true:
ds4-server starts with --ctx 393216 or higher.reasoning_effort: "max" or the equivalent ds4 effort field.If you run that large context, update both the server flags and OpenClaw model metadata:
{
contextWindow: 393216,
maxTokens: 384000,
compat: {
supportsUsageInStreaming: true,
supportsReasoningEffort: true,
maxTokensField: "max_tokens",
supportsStrictMode: false,
thinkingFormat: "deepseek",
supportedReasoningEfforts: ["low", "medium", "high", "xhigh", "max"],
},
}
Start with a direct HTTP check:
curl http://127.0.0.1:18000/v1/chat/completions \
-H 'content-type: application/json' \
-d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"Reply with exactly: ds4-ok"}],"max_tokens":16,"stream":false,"thinking":{"type":"disabled"}}'
Then test OpenClaw model routing:
openclaw infer model run \
--local \
--model ds4/deepseek-v4-flash \
--thinking off \
--prompt "Reply with exactly: openclaw-ds4-ok" \
--json
For a full agent and tool-call smoke, use a context of at least 32768:
openclaw agent \
--local \
--session-id ds4-tool-smoke \
--model ds4/deepseek-v4-flash \
--thinking off \
--message "Use the shell command pwd once, then reply exactly: tool-ok <output>" \
--json \
--timeout 240
Expected result:
executionTrace.winnerProvider is ds4executionTrace.winnerModel is deepseek-v4-flashtoolSummary.calls is at least 1finalAssistantVisibleText starts with tool-ok```bash
curl http://127.0.0.1:18000/v1/models
```