development_docs/adding_backend_and_mcp_tools.md
This guide explains how to create tools that are accessible via both the backend (chat panel) and MCP (Model Context Protocol) server endpoints.
marimo provides a unified framework for creating tools that can be used by AI assistants to interact with notebooks. These tools are automatically registered in both:
The unified architecture means you write a tool once and it works in both contexts.
Create a new file in marimo/_ai/_tools/tools/your_tool.py for your tool implementation.
Create dataclasses for your tool's arguments and output. Place these at the top of your tool file.
Template:
from dataclasses import dataclass, field
from marimo._ai._tools.types import SuccessResult
from marimo._types.ids import SessionId
@dataclass
class YourToolArgs:
"""Arguments for your tool."""
session_id: SessionId
# Add other required parameters
optional_param: str = "default_value"
@dataclass
class YourToolOutput(SuccessResult):
"""Output from your tool."""
# Add your output fields
data: dict = field(default_factory=dict)
count: int = 0
Important Type Patterns:
Args, output dataclasses must end with OutputSuccessResult (provides status, next_steps, message, etc.)field(default_factory=...) for mutable defaults (lists, dicts)SessionId, CellId_t for consistencyImplement your tool class in the same file:
Template:
# Copyright 2026 Marimo. All rights reserved.
from __future__ import annotations
from dataclasses import dataclass, field
from typing import TYPE_CHECKING
from marimo._ai._tools.base import ToolBase
from marimo._ai._tools.types import SuccessResult, ToolGuidelines
from marimo._ai._tools.utils.exceptions import ToolExecutionError
from marimo._types.ids import SessionId
if TYPE_CHECKING:
from marimo._session import Session
@dataclass
class YourToolArgs:
"""Arguments for your tool."""
session_id: SessionId
# Add parameters here
@dataclass
class YourToolOutput(SuccessResult):
"""Output from your tool."""
# Add output fields here
sample_dict: dict = field(default_factory=dict)
class YourTool(ToolBase[YourToolArgs, YourToolOutput]):
"""Brief description of what this tool does.
More detailed explanation of the tool's purpose and functionality.
This docstring becomes the tool's description shown to AI assistants.
Args:
session_id: The session ID of the notebook
# Document other args
Returns:
A success result containing [describe what it returns].
"""
guidelines = ToolGuidelines(
when_to_use=[
"When [describe primary use case]",
],
prerequisites=[
"You must [describe args that need additional explanation]",
],
avoid_if=[
"When [describe when not to use]",
],
additional_info=(
"Any additional context or warnings about tool usage."
),
)
def handle(self, args: YourToolArgs) -> YourToolOutput:
"""Implement your tool logic here."""
# ToolContext provides access to sessions, notebooks, and all marimo state
context = self.context
session_id = args.session_id
session = context.get_session(session_id)
# Implement your logic
sample_dict = self._do_work(session)
return YourToolOutput(
sample_dict=sample_dict,
next_steps=[
"Review the results",
"Consider next actions",
],
message="Optionally add this is the results require more explanation",
)
# Helper methods (prefix with _)
def _do_work(self, session: Session) -> dict:
"""Private helper method."""
# Implementation details
return {}
ToolContext is your gateway to all marimo state—sessions, notebooks, cells, errors, and more. It's available via self.context in your tool.
Access via self.context in your handle() method:
def handle(self, args: YourToolArgs) -> YourToolOutput:
# Access ToolContext
context = self.context
# Use context methods
session = context.get_session(args.session_id)
errors = context.get_notebook_errors(args.session_id)
Add to ToolContext when:
Use helper methods when:
Example:
class YourTool(ToolBase[YourToolArgs, YourToolOutput]):
def handle(self, args: YourToolArgs) -> YourToolOutput:
# Use ToolContext for common operations
session = self.context.get_session(args.session_id)
errors = self.context.get_notebook_errors(args.session_id)
# Use helper methods for tool-specific logic
filtered_data = self._filter_by_criteria(errors, args.criteria)
return YourToolOutput(data=filtered_data)
def _filter_by_criteria(self, errors: list, criteria: str) -> list:
"""Tool-specific logic as a helper method."""
return [e for e in errors if criteria in e.message]
For the current and complete list of available methods, see marimo/_ai/_tools/base.py in the ToolContext class. Common methods include:
get_session(session_id) - Get a notebook sessionget_notebook_errors(session_id, include_stderr) - Get all errors in a notebookget_cell_errors(session_id, cell_id) - Get errors for a specific cellget_active_sessions_internal() - Get list of active notebook sessionsToolGuidelines help AI assistants understand when and how to use your tool. Customize based on your tool's specific use case.
Fields:
when_to_use: List specific scenarios where your tool is appropriate
"When the user needs to inspect cell outputs"avoid_if: List scenarios where your tool should NOT be used
"When the session hasn't been started yet"prerequisites: Required state or information before using the tool
"Valid session ID from an active notebook" (only if accessing notebook data)side_effects: Any state changes your tool makes
"Modifies notebook cells", "Triggers cell re-execution"additional_info: Additional context or warnings (single string)
"This tool provides static analysis only"⚠️ Warning: Too many guidelines can confuse the AI agent. Less is more—only add guidelines when you clearly understand the use cases. If you're unsure, keep it minimal
Only use try/except when you need to catch a specific error and provide tailored guidance to the AI agent.
ToolExecutionError and surfaced to the agentUse ToolExecutionError for expected failures:
from marimo._ai._tools.utils.exceptions import ToolExecutionError
# Raise structured errors
raise ToolExecutionError(
"Clear description of what went wrong",
code="ERROR_CODE", # Machine-readable code
is_retryable=True, # Can the user retry?
suggested_fix="How to fix the issue", # User-friendly guidance
meta={"session_id": session_id}, # Additional context
)
Common Error Codes:
SESSION_NOT_FOUND: Session ID doesn't existCELL_NOT_FOUND: Cell ID doesn't existBAD_ARGUMENTS: Invalid arguments passedOPERATION_FAILED: Generic operation failureUNEXPECTED_ERROR: Uncaught exception (handled automatically)Error Handling Best Practices:
def handle(self, args: YourToolArgs) -> YourToolOutput:
# ToolContext methods automatically raise ToolExecutionError if session not found
session = self.context.get_session(args.session_id)
# Validate inputs - raise ToolExecutionError directly for validation errors
if args.count < 0:
raise ToolExecutionError(
"Count must be non-negative",
code="INVALID_COUNT",
is_retryable=False,
suggested_fix="Provide a count >= 0",
)
# Only use try/except for specific expected errors where you want to guide the agent
try:
result = self._operation_that_might_fail()
except ValueError as e:
# Caught specific error - provide tailored guidance
raise ToolExecutionError(
f"Invalid cell ID: {e}",
code="INVALID_CELL_ID",
is_retryable=False,
suggested_fix="Use get_lightweight_cell_map to find valid cell IDs",
)
# Don't wrap everything in try/except - unexpected errors are handled automatically
return YourToolOutput(data=result)
Add your tool to the registry in marimo/_ai/_tools/tools_registry.py:
from marimo._ai._tools.tools.your_tool import YourTool
SUPPORTED_BACKEND_AND_MCP_TOOLS: list[type[ToolBase[Any, Any]]] = [
GetMarimoRules,
GetActiveNotebooks,
# ... existing tools ...
YourTool, # Add your tool here
]
That's it! Your tool is now automatically registered in both backend and MCP contexts.
Add your tool's Args and Output classes to the TOOL_IO_CLASSES list in tests/_utils/test_msgspec_basestruct.py. This ensures type compatibility between our serialization system and pydantic (used by the python mcp sdk).
from marimo._ai._tools.tools.your_tool import (
YourToolArgs,
YourToolOutput,
)
TOOL_IO_CLASSES = [
# ... existing classes ...
YourToolArgs,
YourToolOutput,
]
Create tests/_ai/tools/tools/test_your_tool.py:
from __future__ import annotations
from unittest.mock import Mock
import pytest
from marimo._ai._tools.base import ToolContext
from marimo._ai._tools.tools.your_tool import (
YourTool,
YourToolArgs,
)
from marimo._ai._tools.utils.exceptions import ToolExecutionError
from marimo._types.ids import SessionId
@pytest.fixture
def tool() -> YourTool:
"""Create a YourTool instance."""
return YourTool(ToolContext())
@pytest.fixture
def mock_context() -> Mock:
"""Create a mock ToolContext."""
return Mock(spec=ToolContext)
def test_your_tool_basic_case(mock_context: Mock) -> None:
"""Test basic functionality."""
# Setup mock
mock_session = Mock()
mock_context.get_session.return_value = mock_session
tool = YourTool(ToolContext())
tool.context = mock_context
# Execute tool
result = tool.handle(YourToolArgs(session_id=SessionId("test")))
# Assertions
assert result.status == "success"
assert result.data is not None
def test_your_tool_error_handling(mock_context: Mock) -> None:
"""Test error handling."""
# Setup mock to raise error
mock_context.get_session.side_effect = ToolExecutionError(
"Session not found",
code="SESSION_NOT_FOUND",
)
tool = YourTool(ToolContext())
tool.context = mock_context
# Should raise ToolExecutionError
with pytest.raises(ToolExecutionError) as exc_info:
tool.handle(YourToolArgs(session_id=SessionId("invalid")))
assert exc_info.value.code == "SESSION_NOT_FOUND"
# if necessary
def test_your_tool_with_edge_cases(mock_context: Mock) -> None:
"""Test edge cases and boundary conditions."""
# Test your tool with edge cases
pass
Run tests:
# Run all tool tests
uv run --python 3.12 --group test pytest tests/_ai/tools
# Run your specific test
uv run --python 3.12 --group test pytest tests/_ai/tools/tools/test_your_tool.py
# Run with verbose output
uv run --python 3.12 --group test pytest tests/_ai/tools/tools/test_your_tool.py -v
Add your tool to the user-facing documentation in docs/guides/editor_features/tools.md. Add a row to the appropriate category table:
## Available tools
### [Appropriate Category]
| Tool | Description |
|------|-------------|
| **your_tool_name** | Brief description of what the tool does. Takes `param1` and `param2` parameters. Returns description of output. |
Choose the appropriate category:
SessionId, CellId_t, etc.)marimo/_ai/_tools/types.py if shared across many filesDesign helpful outputs:
return YourToolOutput(
data=result,
# Provide actionable next steps
next_steps=[
"Use get_cell_runtime_data to inspect cells",
"Check errors with get_notebook_errors",
],
# Optional user-facing message
message="Found 5 items matching your query",
# Optional metadata
meta={"query_time": 0.5},
)
_# Bad: Reimplementing context logic
def handle(self, args: Args) -> Output:
session = self.context.get_session(args.session_id)
cell_notifications = session.session_view.cell_notifications
errors = []
for cell_id, op in cell_notifications.items():
if op.output and op.output.channel == CellChannel.MARIMO_ERROR:
errors.append(...) # Duplicating error extraction
# Good: Using context methods
def handle(self, args: Args) -> Output:
errors = self.context.get_notebook_errors(
args.session_id,
include_stderr=True
)
# Bad: Using generic exceptions
if not found:
raise ValueError("Not found")
# Good: Structured error with metadata
if not found:
raise ToolExecutionError(
"Cell not found in session",
code="CELL_NOT_FOUND",
is_retryable=False,
suggested_fix="Use get_lightweight_cell_map to find valid cell IDs",
)
# Bad: Returning raw data
def handle(self, args: Args) -> Output:
return {"data": [...], "count": 5} # type: ignore
# Good: Structured output with SuccessResult
def handle(self, args: Args) -> Output:
return YourToolOutput(
data=[...],
count=5,
next_steps=["Review the results"],
)
# Bad: Using TypedDict for tool input/output
from typing import TypedDict
class YourToolArgs(TypedDict):
session_id: str
count: int
# Good: Using dataclasses as required
from dataclasses import dataclass
@dataclass
class YourToolArgs:
session_id: SessionId
count: int = 0
Why? The tool system requires dataclasses for proper serialization, validation, and compatibility with both backend and MCP contexts.
For operations that need async/await:
class AsyncTool(ToolBase[Args, Output]):
"""Tool with async operations."""
async def handle(self, args: Args) -> Output: # type: ignore[override]
"""Note: Add type: ignore[override] for async handle."""
session = self.context.get_session(args.session_id)
result = await self._async_work(session)
return Output(result=result)
Generally it's better to avoid side effects in your tool. If it can't be avoided make sure to document side effects in guidelines:
guidelines = ToolGuidelines(
side_effects=[
"Modifies notebook cells",
"Triggers cell re-execution",
],
)
Use nested dataclasses for complex outputs:
@dataclass
class CellInfo:
cell_id: str
code: str
@dataclass
class ComplexOutput(SuccessResult):
cells: list[CellInfo] = field(default_factory=list)
summary: dict[str, Any] = field(default_factory=dict)
Before submitting your tool:
ToolBase[ArgsT, OutT]ArgsSuccessResult and ends with Outputhandle() method is implementedtools_registry.pyTOOL_IO_CLASSES in tests/_utils/test_msgspec_basestruct.pyToolGuidelines provided (only if use cases are clear)ToolExecutionError for expected failures onlyToolContext appropriatelydocs/guides/editor_features/tools.mdmarimo/_ai/_tools/base.pymarimo/_ai/_tools/base.py (ToolContext)marimo/_ai/_tools/utils/exceptions.pymarimo/_ai/_tools/types.pymarimo/_mcp/server/main.pymarimo/_server/ai/tools/tool_manager.pyIf you have questions or run into issues:
marimo/_ai/_tools/tools/ for examplestests/_ai/tools/tools/ for testing patterns