docs/content/features/mcp.md
+++ title = "Model Context Protocol (MCP)" weight = 20 toc = true description = "Agentic capabilities with Model Context Protocol integration" tags = ["MCP", "Agents", "Tools", "Advanced"] categories = ["Features"] +++
LocalAI now supports the Model Context Protocol (MCP), enabling powerful agentic capabilities by connecting AI models to external tools and services. This feature allows your LocalAI models to interact with various MCP servers, providing access to real-time data, APIs, and specialized tools.
The Model Context Protocol is a standard for connecting AI models to external tools and data sources. It enables AI agents to:
metadata.mcp_servers to enable only specific servers per requestMCP support is configured in your model's YAML configuration file using the mcp section:
name: my-mcp-model
backend: llama-cpp
parameters:
model: qwen3-4b.gguf
mcp:
remote: |
{
"mcpServers": {
"weather-api": {
"url": "https://api.weather.com/v1",
"token": "your-api-token"
},
"search-engine": {
"url": "https://search.example.com/mcp",
"token": "your-search-token"
}
}
}
stdio: |
{
"mcpServers": {
"file-manager": {
"command": "python",
"args": ["-m", "mcp_file_manager"],
"env": {
"API_KEY": "your-key"
}
},
"database-tools": {
"command": "node",
"args": ["database-mcp-server.js"],
"env": {
"DB_URL": "postgresql://localhost/mydb"
}
}
}
}
agent:
max_iterations: 10 # Maximum MCP tool execution loop iterations
remote)Configure HTTP-based MCP servers:
url: The MCP server endpoint URLtoken: Bearer token for authentication (optional)stdio)Configure local command-based MCP servers:
command: The executable command to runargs: Array of command-line argumentsenv: Environment variables (optional)agent)max_iterations: Maximum number of MCP tool execution loop iterations (default: 10). Each iteration allows the model to call tools and receive results before generating the next response.metadataAll API endpoints support MCP server selection through the standard metadata field. Pass a comma-separated list of server names in metadata.mcp_servers:
"weather-api", "search-engine").The mcp_servers metadata key is consumed by the MCP engine and stripped before reaching the backend. Clients that support the standard metadata field can use this without custom schema extensions.
MCP tools work across all three API endpoints:
/v1/chat/completions)curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"messages": [{"role": "user", "content": "What is the weather in New York?"}],
"metadata": {"mcp_servers": "weather-api"},
"stream": true
}'
/v1/messages)curl http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "What is the weather in New York?"}],
"metadata": {"mcp_servers": "weather-api"}
}'
/v1/responses)curl http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"input": "What is the weather in New York?",
"metadata": {"mcp_servers": "weather-api"}
}'
You can list available MCP servers and their tools for a given model:
curl http://localhost:8080/v1/mcp/servers/my-mcp-model
Returns:
[
{
"name": "weather-api",
"type": "remote",
"tools": ["get_weather", "get_forecast"]
},
{
"name": "search-engine",
"type": "remote",
"tools": ["web_search", "image_search"]
}
]
MCP servers can provide reusable prompt templates. LocalAI supports discovering and expanding prompts from MCP servers.
curl http://localhost:8080/v1/mcp/prompts/my-mcp-model
Returns:
[
{
"name": "code-review",
"description": "Review code for best practices",
"title": "Code Review",
"arguments": [
{"name": "language", "description": "Programming language", "required": true}
],
"server": "dev-tools"
}
]
curl -X POST http://localhost:8080/v1/mcp/prompts/my-mcp-model/code-review \
-H "Content-Type: application/json" \
-d '{"arguments": {"language": "go"}}'
Returns:
{
"messages": [
{"role": "user", "content": "Please review the following Go code for best practices..."}
]
}
You can inject MCP prompts into any chat request using metadata.mcp_prompt and metadata.mcp_prompt_args:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"messages": [{"role": "user", "content": "Review this function: func add(a, b int) int { return a + b }"}],
"metadata": {
"mcp_servers": "dev-tools",
"mcp_prompt": "code-review",
"mcp_prompt_args": "{\"language\": \"go\"}"
}
}'
The prompt messages are prepended to the conversation before inference.
MCP servers can expose data/content (files, database records, etc.) as resources identified by URI.
curl http://localhost:8080/v1/mcp/resources/my-mcp-model
Returns:
[
{
"name": "project-readme",
"uri": "file:///README.md",
"description": "Project documentation",
"mimeType": "text/markdown",
"server": "file-manager"
}
]
curl -X POST http://localhost:8080/v1/mcp/resources/my-mcp-model/read \
-H "Content-Type: application/json" \
-d '{"uri": "file:///README.md"}'
Returns:
{
"uri": "file:///README.md",
"content": "# My Project\n...",
"mimeType": "text/markdown"
}
You can inject MCP resources into chat requests using metadata.mcp_resources (comma-separated URIs):
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"messages": [{"role": "user", "content": "Summarize this project"}],
"metadata": {
"mcp_servers": "file-manager",
"mcp_resources": "file:///README.md,file:///CHANGELOG.md"
}
}'
Resource contents are appended to the last user message as text blocks (following the same approach as llama.cpp's WebUI).
The /mcp/v1/chat/completions endpoint is still supported for backward compatibility. It automatically enables all configured MCP servers (equivalent to not specifying mcp_servers).
curl http://localhost:8080/mcp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-mcp-model",
"messages": [
{"role": "user", "content": "What is the current weather in New York?"}
]
}'
{
"id": "chatcmpl-123",
"created": 1699123456,
"model": "my-mcp-model",
"choices": [
{
"text": "The current weather in New York is 72°F (22°C) with partly cloudy skies."
}
],
"object": "text_completion"
}
name: docker-agent
backend: llama-cpp
parameters:
model: qwen3-4b.gguf
mcp:
stdio: |
{
"mcpServers": {
"searxng": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"quay.io/mudler/tests:duckduckgo-localai"
]
}
}
}
agent:
max_iterations: 10
The execution loop is bounded by agent.max_iterations (default 10) to prevent infinite loops.
MCP sessions are automatically managed by LocalAI:
This means you don't need to manually manage MCP connections — they follow the model's lifecycle automatically.
LocalAI is compatible with any MCP-compliant server.
Use MCP-enabled models in your applications:
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="my-mcp-model",
messages=[
{"role": "user", "content": "Analyze the latest research papers on AI"}
],
extra_body={"metadata": {"mcp_servers": "search-engine"}}
)
It might be handy to install packages before starting the container to setup the environment. This is an example on how you can do that with docker-compose (installing and configuring docker)
services:
local-ai:
image: localai/localai:latest
#image: localai/localai:latest-gpu-nvidia-cuda-13
#image: localai/localai:latest-gpu-nvidia-cuda-12
container_name: local-ai
restart: always
entrypoint: [ "/bin/bash" ]
command: >
-c "apt-get update &&
apt-get install -y docker.io &&
/entrypoint.sh"
environment:
- DEBUG=true
- LOCALAI_WATCHDOG_IDLE=true
- LOCALAI_WATCHDOG_BUSY=true
- LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m
- LOCALAI_WATCHDOG_BUSY_TIMEOUT=15m
- LOCALAI_API_KEY=my-beautiful-api-key
- DOCKER_HOST=tcp://docker:2376
- DOCKER_TLS_VERIFY=1
- DOCKER_CERT_PATH=/certs/client
ports:
- "8080:8080"
volumes:
- /data/models:/models
- /data/backends:/backends
- certs:/certs:ro
# uncomment for nvidia
# deploy:
# resources:
# reservations:
# devices:
# - capabilities: [gpu]
# device_ids: ['7']
# runtime: nvidia
docker:
image: docker:dind
privileged: true
container_name: docker
volumes:
- certs:/certs
healthcheck:
test: ["CMD", "docker", "info"]
interval: 10s
timeout: 5s
volumes:
certs:
An example model config (to append to any existing model you have) can be:
mcp:
stdio: |
{
"mcpServers": {
"weather": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"ghcr.io/mudler/mcps/weather:master"
]
},
"memory": {
"command": "docker",
"env": {
"MEMORY_INDEX_PATH": "/data/memory.bleve"
},
"args": [
"run", "-i", "--rm", "-v", "/host/data:/data",
"ghcr.io/mudler/mcps/memory:master"
]
},
"ddg": {
"command": "docker",
"env": {
"MAX_RESULTS": "10"
},
"args": [
"run", "-i", "--rm", "-e", "MAX_RESULTS",
"ghcr.io/mudler/mcps/duckduckgo:master"
]
}
}
}
In addition to server-side MCP (where the backend connects to MCP servers), LocalAI supports client-side MCP where the browser connects directly to MCP servers. This is inspired by llama.cpp's WebUI and works alongside server-side MCP.
StreamableHTTPClientTransport or SSEClientTransport) to connect to MCP serverstools in the chat request bodySince browsers enforce CORS restrictions, LocalAI provides a built-in proxy at /api/cors-proxy. When "Use CORS proxy" is enabled (default), requests to external MCP servers are routed through:
/api/cors-proxy?url=https://remote-mcp-server.example.com/sse
The proxy forwards the request method, headers, and body to the target URL and streams the response back with appropriate CORS headers.
LocalAI supports the MCP Apps extension, which allows MCP tools to declare interactive HTML UIs. When a tool has _meta.ui.resourceUri in its definition, calling that tool renders the app's HTML inline in the chat as a sandboxed iframe.
How it works:
_meta.ui.resourceUri, the browser fetches the HTML resource from the MCP server and renders it in an iframeallow-scripts allow-forms, no allow-same-origin) for securityAppBridge protocol (JSON-RPC over postMessage)_meta.ui.visibility: "app-only") are hidden from the LLM and only callable by the app iframeRequirements:
_meta.ui.resourceUri on tools, resource serving)Both modes work simultaneously in the same chat:
tools in the request. When the LLM calls them, the browser executes them.If both sides have a tool with the same name, the server-side tool takes priority.
LOCALAI_DISABLE_MCP is not set).You can completely disable MCP functionality in LocalAI by setting the LOCALAI_DISABLE_MCP environment variable to true, 1, or yes:
export LOCALAI_DISABLE_MCP=true
When this environment variable is set, all MCP-related features will be disabled, including:
/mcp/v1/chat/completions endpointThis is useful when you want to:
# Disable MCP completely
LOCALAI_DISABLE_MCP=true localai run
# Or in Docker
docker run -e LOCALAI_DISABLE_MCP=true localai/localai:latest
When MCP is disabled, any model configuration with mcp sections will be ignored, and attempts to use the MCP endpoint will return an error indicating that MCP support is disabled.