docs/mcp-protocol.md
NOTICE: This document was AI-assisted; when implementing a backend, always cross-check the details against the code.
In this project, MCP is used between the backend API (MCP client) and the ESP32 device (MCP server) to let the backend discover and invoke the device's capabilities (tools).
From main/protocols/protocol.cc and main/mcp_server.cc, MCP messages are wrapped inside the underlying transport (WebSocket or MQTT). The inner payload follows the JSON-RPC 2.0 specification.
Overall message layout:
{
"session_id": "...", // session id
"type": "mcp", // fixed value "mcp"
"payload": { // JSON-RPC 2.0 payload
"jsonrpc": "2.0",
"method": "...", // method name ("initialize", "tools/list", "tools/call", ...)
"params": { ... }, // arguments (for requests)
"id": ..., // request id (for requests and responses)
"result": { ... }, // success result (response)
"error": { ... } // error (response)
}
}
The payload follows standard JSON-RPC 2.0:
jsonrpc: always "2.0".method: the method name (requests).params: structured parameters, usually an object (requests).id: request identifier; echoed back in responses.result: success value (responses).error: error information (responses).MCP interactions are driven by the client (backend) discovering and invoking tools on the device.
Connection and capability announcement
"mcp": true in the features map.{
"type": "hello",
"version": 1,
"features": {
"mcp": true
},
"transport": "websocket",
"audio_params": { ... },
"session_id": "..."
}
Initialize the MCP session
When: after the backend sees that the device supports MCP. Usually the first MCP request.
Direction: backend -> device.
Method: initialize
Message (MCP payload):
{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"capabilities": {
// optional client capabilities
"vision": {
"url": "...", // camera image upload endpoint (must be an http URL, not a websocket URL)
"token": "..." // token for the upload URL
}
// ... other client capabilities
}
},
"id": 1
}
Device response:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"protocolVersion": "2024-11-05",
"capabilities": {
"tools": {}
},
"serverInfo": {
"name": "...", // device name (BOARD_NAME)
"version": "..." // firmware version
}
}
}
Discover the tools
tools/listcursor (string, optional): pagination cursor. Empty on the first request.withUserTools (boolean, optional, default false): if true, the device also includes "user-only" tools (see "User-only tools" below) in the listing. This is typically used by a companion app that lets the user trigger privileged actions directly.{
"jsonrpc": "2.0",
"method": "tools/list",
"params": {
"cursor": "",
"withUserTools": false
},
"id": 2
}
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"tools": [
{
"name": "self.get_device_status",
"description": "...",
"inputSchema": { ... }
},
{
"name": "self.audio_speaker.set_volume",
"description": "...",
"inputSchema": { ... }
}
// ... more tools
],
"nextCursor": "..."
}
}
nextCursor is non-empty, the backend must send another tools/list request with that cursor to fetch the next page.Call a tool
tools/call{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "self.audio_speaker.set_volume",
"arguments": {
"volume": 50
}
},
"id": 3
}
{
"jsonrpc": "2.0",
"id": 3,
"result": {
"content": [
{ "type": "text", "text": "true" }
],
"isError": false
}
}
{
"jsonrpc": "2.0",
"id": 3,
"error": {
"code": -32601,
"message": "Unknown tool: self.non_existent_tool"
}
}
Device-initiated notifications
Application::SendMcpMessage is the outbound entry point.notifications/... or any custom method.id.
{
"jsonrpc": "2.0",
"method": "notifications/state_changed",
"params": {
"newState": "idle",
"oldState": "connecting"
}
}
The MCP server on the device maintains two kinds of tools:
McpServer::AddTool. Exposed to the backend (and hence the AI model) by default.McpServer::AddUserOnlyTool. These are hidden from standard tools/list results, because they are privileged or user-facing actions that should not be invoked autonomously by the AI. Examples include system reboot, firmware upgrade, and screen snapshot upload.The backend opts in to user-only tools by sending tools/list with params.withUserTools = true. Typical usage: a companion app screen that exposes these actions to the end user.
See MCP IoT control usage for how to register either kind of tool on the device side.
A simplified diagram of the main MCP message flow:
sequenceDiagram
participant Device as ESP32 Device
participant BackendAPI as Backend API (Client)
Note over Device, BackendAPI: Establish WebSocket / MQTT
Device->>BackendAPI: Hello (features.mcp = true)
BackendAPI->>Device: MCP Initialize request
Note over BackendAPI: method: initialize
Note over BackendAPI: params: { capabilities: ... }
Device->>BackendAPI: MCP Initialize response
Note over Device: result: { protocolVersion, serverInfo, ... }
BackendAPI->>Device: MCP tools/list request
Note over BackendAPI: params: { cursor: "", withUserTools: false }
Device->>BackendAPI: MCP tools/list response
Note over Device: result: { tools: [...], nextCursor: ... }
loop Optional pagination
BackendAPI->>Device: MCP tools/list request
Note over BackendAPI: params: { cursor: "..." }
Device->>BackendAPI: MCP tools/list response
Note over Device: result: { tools: [...], nextCursor: "" }
end
BackendAPI->>Device: MCP tools/call request
Note over BackendAPI: params: { name, arguments }
alt Call succeeds
Device->>BackendAPI: MCP tools/call success response
Note over Device: result: { content, isError: false }
else Call fails
Device->>BackendAPI: MCP tools/call error response
Note over Device: error: { code, message }
end
opt Device notification
Device->>BackendAPI: MCP notification
Note over Device: method: notifications/...
end
This document summarizes the MCP interaction flow in this project. For exact parameter shapes, behavior, and available tools, refer to McpServer::AddCommonTools / AddUserOnlyTools in main/mcp_server.cc and the per-board InitializeTools implementations.