content/manuals/ai/model-runner/ide-integrations.md
Docker Model Runner can serve as a local backend for popular AI coding assistants and development tools. This guide shows how to configure common tools to use models running in DMR.
Before configuring any tool:
$ docker desktop enable model-runner --tcp 12434
$ docker model pull ai/qwen2.5-coder
[!TIP]
The default context size for many models (such as
gpt-oss) is 4,096 tokens, which is limiting for coding tasks. You can repackage it with a larger context window:console$ docker model pull gpt-oss $ docker model package --from ai/gpt-oss --context-size 32000 gpt-oss:32kAlternatively, models like ai/glm-4.7-flash, ai/qwen2.5-coder, and ai/devstral-small-2 come with 128K context by default and work without repackaging.
Cline is an AI coding assistant for VS Code.
| Setting | Value |
|---|---|
| Base URL | http://localhost:12434/engines/v1 |
| API Key | not-needed (or any placeholder value) |
| Model ID | ai/qwen2.5-coder (or your preferred model) |
[!IMPORTANT] The base URL must include
/engines/v1at the end. Do not include a trailing slash.
If Cline fails to connect:
Verify DMR is running:
$ docker model status
Test the endpoint directly:
$ curl http://localhost:12434/engines/v1/models
Check that CORS is configured if running a web-based version:
Continue is an open-source AI code assistant that works with VS Code and JetBrains IDEs.
Edit your Continue configuration file (~/.continue/config.json):
{
"models": [
{
"title": "Docker Model Runner",
"provider": "openai",
"model": "ai/qwen2.5-coder",
"apiBase": "http://localhost:12434/engines/v1",
"apiKey": "not-needed"
}
]
}
Continue also supports the Ollama provider, which works with DMR:
{
"models": [
{
"title": "Docker Model Runner (Ollama)",
"provider": "ollama",
"model": "ai/qwen2.5-coder",
"apiBase": "http://localhost:12434"
}
]
}
Cursor is an AI-powered code editor.
Open Cursor Settings (Cmd/Ctrl + ,).
Navigate to Models > OpenAI API Key.
Configure:
| Setting | Value |
|---|---|
| OpenAI API Key | not-needed |
| Override OpenAI Base URL | http://localhost:12434/engines/v1 |
In the model drop-down, enter your model name: ai/qwen2.5-coder
[!NOTE] Some Cursor features may require models with specific capabilities (e.g., function calling). Use capable models like
ai/qwen2.5-coderorai/llama3.2for best results.
Zed is a high-performance code editor with AI features.
Edit your Zed settings (~/.config/zed/settings.json):
{
"language_models": {
"openai": {
"api_url": "http://localhost:12434/engines/v1",
"available_models": [
{
"name": "ai/qwen2.5-coder",
"display_name": "Qwen 2.5 Coder (DMR)",
"max_tokens": 8192
}
]
}
}
}
Open WebUI provides a ChatGPT-like interface for local models.
See Open WebUI integration for detailed setup instructions.
Aider is an AI pair programming tool for the terminal.
Set environment variables or use command-line flags:
export OPENAI_API_BASE=http://localhost:12434/engines/v1
export OPENAI_API_KEY=not-needed
aider --model openai/ai/qwen2.5-coder
Or in a single command:
$ aider --openai-api-base http://localhost:12434/engines/v1 \
--openai-api-key not-needed \
--model openai/ai/qwen2.5-coder
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:12434/engines/v1",
api_key="not-needed",
model="ai/qwen2.5-coder"
)
response = llm.invoke("Write a hello world function in Python")
print(response.content)
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
configuration: {
baseURL: "http://localhost:12434/engines/v1",
},
apiKey: "not-needed",
modelName: "ai/qwen2.5-coder",
});
const response = await model.invoke("Write a hello world function");
console.log(response.content);
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(
api_base="http://localhost:12434/engines/v1",
api_key="not-needed",
model="ai/qwen2.5-coder"
)
response = llm.complete("Write a hello world function")
print(response.text)
OpenCode is an open-source coding assistant designed to integrate directly into developer workflows. It supports multiple model providers and exposes a flexible configuration system that makes it easy to switch between them.
~/.config/opencode/opencode.json or project specific with a opencode.json file in the root of your project
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"dmr": {
"npm": "@ai-sdk/openai-compatible",
"name": "Docker Model Runner",
"options": {
"baseURL": "http://localhost:12434/v1"
},
"models": {
"ai/qwen2.5-coder": {
"name": "ai/qwen2.5-coder"
},
"ai/llama3.2": {
"name": "ai/llama3.2"
}
}
}
}
}
You can find more details in this Docker Blog post
Claude Code is Anthropic's command-line tool for agentic coding. It lives in your terminal, understands your codebase, and executes routine tasks, explains complex code, and handles Git workflows through natural language commands.
ANTHROPIC_BASE_URL environment variable to point Claude Code at DMR. On Mac or Linux, you can do this, for example if you want to use the gpt-oss:32k model:
ANTHROPIC_BASE_URL=http://localhost:12434 claude --model qwen2.5-coder
$env:ANTHROPIC_BASE_URL="http://localhost:12434"
claude --model gpt-oss:32k
[!TIP]
To avoid setting the variable each time, add it to your shell profile (
~/.bashrc,~/.zshrc, or equivalent):shellexport ANTHROPIC_BASE_URL=http://localhost:12434
You can find more details in this Docker Blog post
[!NOTE]
While the other integrations on this page use the OpenAI-compatible API, DMR also exposes a Anthropic-compatible API used here.
Ensure Docker Model Runner is enabled and running:
$ docker model status
Verify TCP access is enabled:
$ curl http://localhost:12434/engines/v1/models
Check if another service is using port 12434.
If you run your tool in WSL and want to connect to DMR on the host via localhost, this might not directly work. Configuring WSL to use mirrored networking can solve this.
Verify the model is pulled:
$ docker model list
Use the full model name including namespace (e.g., ai/qwen2.5-coder, not just qwen2.5-coder).
For first requests, models need to load into memory. Subsequent requests are faster.
Consider using a smaller model or adjusting the context size:
$ docker model configure --context-size 4096 ai/qwen2.5-coder
Check available system resources (RAM, GPU memory).
If using browser-based tools, add the origin to CORS allowed origins:
http://localhost:3000)| Use case | Recommended model | Notes |
|---|---|---|
| Code completion | ai/qwen2.5-coder | Optimized for coding tasks |
| General assistant | ai/llama3.2 | Good balance of capabilities |
| Small/fast | ai/smollm2 | Low resource usage |
| Embeddings | ai/all-minilm | For RAG and semantic search |