README.md
Experimental - This project is still in development, and not ready for the prime time.
A minimal, secure Python interpreter written in Rust for use by AI.
Monty avoids the cost, latency, complexity and general faff of using a full container based sandbox for running LLM generated code.
Instead, it lets you safely run Python code written by an LLM embedded in your agent, with startup times measured in single digit microseconds not hundreds of milliseconds.
What Monty can do:
sys, os, typing, asyncio, re, datetime, json, dataclasses (soon)What Monty cannot do:
In short, Monty is extremely limited and designed for one use case:
To run code written by agents.
For motivation on why you might want to do this, see:
In very simple terms, the idea of all the above is that LLMs can work faster, cheaper and more reliably if they're asked to write Python (or Javascript) code, instead of relying on traditional tool calling. Monty makes that possible without the complexity of a sandbox or risk of running code directly on the host.
Note: Monty will (soon) be used to implement codemode in Pydantic AI
Monty can be called from Python, JavaScript/TypeScript or Rust.
To install:
uv add pydantic-monty
(Or pip install pydantic-monty for the boomers)
Usage:
from typing import Any
import pydantic_monty
code = """
async def agent(prompt: str, messages: Messages):
while True:
print(f'messages so far: {messages}')
output = await call_llm(prompt, messages)
if isinstance(output, str):
return output
messages.extend(output)
await agent(prompt, [])
"""
type_definitions = """
from typing import Any
Messages = list[dict[str, Any]]
async def call_llm(prompt: str, messages: Messages) -> str | Messages:
raise NotImplementedError()
prompt: str = ''
"""
m = pydantic_monty.Monty(
code,
inputs=['prompt'],
script_name='agent.py',
type_check=True,
type_check_stubs=type_definitions,
)
Messages = list[dict[str, Any]]
async def call_llm(prompt: str, messages: Messages) -> str | Messages:
if len(messages) < 2:
return [{'role': 'system', 'content': 'example response'}]
else:
return f'example output, message count {len(messages)}'
async def main():
output = await m.run_async(
inputs={'prompt': 'testing'},
external_functions={'call_llm': call_llm},
)
print(output)
#> example output, message count 2
if __name__ == '__main__':
import asyncio
asyncio.run(main())
Use start() and resume() to handle external function calls iteratively,
giving you control over each call:
import pydantic_monty
code = """
data = fetch(url)
len(data)
"""
m = pydantic_monty.Monty(code, inputs=['url'])
# Start execution - pauses when fetch() is called
result = m.start(inputs={'url': 'https://example.com'})
print(type(result))
#> <class 'pydantic_monty.FunctionSnapshot'>
print(result.function_name) # fetch
#> fetch
print(result.args)
#> ('https://example.com',)
# Perform the actual fetch, then resume with the result
result = result.resume({'return_value': 'hello world'})
print(type(result))
#> <class 'pydantic_monty.MontyComplete'>
print(result.output)
#> 11
Both Monty and snapshot types like FunctionSnapshot can be serialized to bytes and restored later.
This allows caching parsed code or suspending execution across process boundaries:
import pydantic_monty
# Serialize parsed code to avoid re-parsing
m = pydantic_monty.Monty('x + 1', inputs=['x'])
data = m.dump()
# Later, restore and run
m2 = pydantic_monty.Monty.load(data)
print(m2.run(inputs={'x': 41}))
#> 42
# Serialize execution state mid-flight
m = pydantic_monty.Monty('fetch(url)', inputs=['url'])
progress = m.start(inputs={'url': 'https://example.com'})
state = progress.dump()
# Later, restore and resume (e.g., in a different process)
progress2 = pydantic_monty.load_snapshot(state)
result = progress2.resume({'return_value': 'response data'})
print(result.output)
#> response data
use monty::{MontyRun, MontyObject, NoLimitTracker, PrintWriter};
let code = r#"
def fib(n):
if n <= 1:
return n
return fib(n - 1) + fib(n - 2)
fib(x)
"#;
let runner = MontyRun::new(code.to_owned(), "fib.py", vec!["x".to_owned()]).unwrap();
let result = runner.run(vec![MontyObject::Int(10)], NoLimitTracker, PrintWriter::Stdout).unwrap();
assert_eq!(result, MontyObject::Int(55));
MontyRun and RunProgress can be serialized using the dump() and load() methods:
use monty::{MontyRun, MontyObject, NoLimitTracker, PrintWriter};
// Serialize parsed code
let runner = MontyRun::new("x + 1".to_owned(), "main.py", vec!["x".to_owned()]).unwrap();
let bytes = runner.dump().unwrap();
// Later, restore and run
let runner2 = MontyRun::load(&bytes).unwrap();
let result = runner2.run(vec![MontyObject::Int(41)], NoLimitTracker, PrintWriter::Stdout).unwrap();
assert_eq!(result, MontyObject::Int(42));
Monty will power code-mode in Pydantic AI. Instead of making sequential tool calls, the LLM writes Python code that calls your tools as functions and Monty executes it safely.
import asyncio
import json
import logfire
from httpx import AsyncClient
from pydantic_ai import Agent, RunContext
from pydantic_ai.toolsets.code_mode import CodeModeToolset
from pydantic_ai.toolsets.function import FunctionToolset
from typing_extensions import TypedDict
logfire.configure()
logfire.instrument_pydantic_ai()
class LatLng(TypedDict):
lat: float
lng: float
weather_toolset: FunctionToolset[AsyncClient] = FunctionToolset()
@weather_toolset.tool
async def get_lat_lng(
ctx: RunContext[AsyncClient], location_description: str
) -> LatLng:
"""Get the latitude and longitude of a location."""
# NOTE: the response here will be random, and is not related to the location description.
r = await ctx.deps.get(
'https://demo-endpoints.pydantic.workers.dev/latlng',
params={'location': location_description},
)
r.raise_for_status()
return json.loads(r.content)
@weather_toolset.tool
async def get_temp(ctx: RunContext[AsyncClient], lat: float, lng: float) -> float:
"""Get the temp at a location."""
# NOTE: the responses here will be random, and are not related to the lat and lng.
r = await ctx.deps.get(
'https://demo-endpoints.pydantic.workers.dev/number',
params={'min': 10, 'max': 30},
)
r.raise_for_status()
return float(r.text)
@weather_toolset.tool
async def get_weather_description(
ctx: RunContext[AsyncClient], lat: float, lng: float
) -> str:
"""Get the weather description at a location."""
# NOTE: the responses here will be random, and are not related to the lat and lng.
r = await ctx.deps.get(
'https://demo-endpoints.pydantic.workers.dev/weather',
params={'lat': lat, 'lng': lng},
)
r.raise_for_status()
return r.text
agent = Agent(
'gateway/anthropic:claude-sonnet-4-5',
# toolsets=[weather_toolset],
toolsets=[CodeModeToolset(weather_toolset)],
deps_type=AsyncClient,
)
async def main():
async with AsyncClient() as client:
await agent.run('Compare the weather of London, Paris, and Tokyo.', deps=client)
if __name__ == '__main__':
asyncio.run(main())
There are generally two responses when you show people Monty:
Where X is some alternative technology. Oddly often these responses are combined, suggesting people have not yet found an alternative that works for them, but are incredulous that there's really no good alternative to creating an entire Python implementation from scratch.
I'll try to run through the most obvious alternatives, and why there aren't right for what we wanted.
NOTE: all these technologies are impressive and have widespread uses, this commentary on their limitations for our use case should not be seen as a criticism. Most of these solutions were not conceived with the goal of providing an LLM sandbox, which is why they're not necessary great at it.
| Tech | Language completeness | Security | Start latency | FOSS | Setup complexity | File mounting | Snapshotting |
|---|---|---|---|---|---|---|---|
| Monty | partial | strict | 0.06ms | free / OSS | easy | easy | easy |
| Docker | full | good | 195ms | free / OSS | intermediate | easy | intermediate |
| Pyodide | full | poor | 2800ms | free / OSS | intermediate | easy | hard |
| starlark-rust | very limited | good | 1.7ms | free / OSS | easy | not available? | impossible? |
| WASI / Wasmer | partial, almost full | strict | 66ms | free * | intermediate | easy | intermediate |
| sandboxing service | full | strict | 1033ms | not free | intermediate | hard | intermediate |
| YOLO Python | full | non-existent | 0.1ms / 30ms | free / OSS | easy | easy / scary | hard |
See ./scripts/startup_performance.py for the script used to calculate the startup performance numbers.
Details on each row below:
pip install pydantic-monty or npm install @pydantic/monty, ~4.5MB downloaddump() and load() makes it trivial to pause, resume and fork executionpython:3.14-alpine is 50MB - docker can't be installed from PyPISee starlark-rust.
Running Python in WebAssembly via Wasmer.
python/python wasmer package package has no readme, no license, no source link and no indication of how it's built, the recently uploaded versions show size as "0B" although the download is ~50MB - the build process for the Python binary is not clear and transparent. (If I'm wrong here, please create an issue to correct correct me)Services like Daytona, E2B, Modal.
There are similar challenges, more setup complexity but lower network latency for setting up your own sandbox setup with k8s.
Running Python directly via exec() (~0.1ms) or subprocess (~30ms).
exec(), ~30ms for subprocess