docs/src/content/docs/guides/agents/expose-as-mcp.md
mistral.rs can expose the loaded model as an MCP (Model Context Protocol) server: a chat tool over JSON-RPC 2.0 that any MCP client can call.
mistralrs serve -m Qwen/Qwen3-4B --mcp-port 4321
--mcp-port starts an additional listener. The port rules:
--host with the main HTTP API.--port (default 1234) still runs alongside.--mcp-port must differ from --port.In a TOML config, the equivalent is mcp_port under [server]:
command = "serve"
[server]
port = 1234
mcp_port = 4321
Clients connect to http://<host>:<mcp_port>/mcp. Each call is a POST /mcp with a JSON-RPC 2.0 body.
initialize: returns {"capabilities":{"tools":{}},"instructions":...,"protocolVersion":"2025-11-25","serverInfo":{"name":"mistralrs","version":...}}.ping: returns {}.tools/list: returns the chat tool. The list is empty if the loaded model does not have text input and output modalities.tools/call: runs the chat tool.Anything else returns JSON-RPC error -32601 (method not found). A body with jsonrpc other than "2.0" returns -32600; tool execution failures return -32603.
You can pass any OpenAI ChatCompletionRequest field in arguments; the advertised schema only documents the common ones. The schema:
messages: an array of {role, content} objects with roles user, assistant, or system.max_tokens and temperature.model to "default".curl http://localhost:4321/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "chat",
"arguments": {
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 50
}
}
}'
The result is MCP tool-call content:
{"content": [{"type": "text", "text": "Hello! How can I help?"}]}
The MCP endpoint has no built-in authentication. For non-localhost use, place an authenticating proxy in front.