services/computer-use-mcp/README.md
AIRI-specific macOS desktop orchestration MCP service.
This package exists because AIRI already has many useful pieces in the monorepo — providers, chat UX, MCP attachment, desktop app surfaces, browser integrations, tool bridges, and workflow-related logic — but those pieces are still too easy to use as isolated features instead of one coherent agent system.
computer-use-mcp is the missing execution substrate for that gap.
The current goal is not to add "another computer use demo". The goal is to give AIRI a unified way to:
In short:
computer-use-mcp is the local execution and workflow substrateThis package is no longer positioned as a generic remote computer-use experiment. The current v1 shape is:
computer-use-mcp provides a local macOS execution layer:
approval_required still comes from MCPThe intended story is:
This package should not be understood as a coordinate-replay automation toy.
What makes it different:
terminal_exec, workflows, browser_dom_*) before raw coordinate actionsThat means the package is useful only when it helps AIRI turn scattered local capabilities into one observable, controllable task system.
dry-run
macos-local
NSWorkspace + CGWindowListCGEventopen -a and activatelinux-x11
Desktop observation and control:
desktop_get_capabilitiesdesktop_observe_windowsdesktop_screenshotdesktop_open_appdesktop_focus_appdesktop_clickdesktop_type_textdesktop_press_keysdesktop_scrolldesktop_waitTerminal orchestration:
terminal_execterminal_get_stateterminal_reset_stateClipboard bridge:
secret_read_env_valueclipboard_read_textclipboard_write_textBrowser DOM bridge:
browser_agent_get_statusbrowser_agent_runbrowser_dom_get_bridge_statusbrowser_dom_get_active_tabbrowser_dom_read_pagebrowser_dom_find_elementsbrowser_dom_clickbrowser_dom_read_input_valuebrowser_dom_set_input_valuebrowser_dom_check_checkboxbrowser_dom_select_optionbrowser_dom_wait_for_elementbrowser_dom_get_element_attributesbrowser_dom_get_computed_stylesbrowser_dom_trigger_eventApproval and audit helpers:
desktop_list_pending_actionsdesktop_approve_pending_actiondesktop_reject_pending_actiondesktop_get_session_traceWorkflow orchestration:
workflow_open_workspace
workflow_validate_workspace
pwd, inspects local changes, and runs a validation command such as pnpm typecheckworkflow_run_tests
workflow_inspect_failure
workflow_browse_and_act
workflow_resume
approval_requiredThe current macOS v1 boundary is intentionally narrow and explicit:
allowApps is not used as a hard gate for click/type/scrolldenyApps still blocks sensitive foreground appsCOMPUTER_USE_OPENABLE_APPS only gates desktop_open_app and desktop_focus_appCore:
COMPUTER_USE_EXECUTOR
dry-run, macos-local, or linux-x11COMPUTER_USE_APPROVAL_MODE
actions (default), all, neverCOMPUTER_USE_SESSION_ROOT
audit.jsonlCOMPUTER_USE_TIMEOUT_MSCOMPUTER_USE_DEFAULT_CAPTURE_AFTERCOMPUTER_USE_MAX_OPERATIONSCOMPUTER_USE_MAX_OPERATION_UNITSCOMPUTER_USE_MAX_PENDING_ACTIONSmacOS orchestration:
COMPUTER_USE_OPENABLE_APPS
Terminal,Cursor,Google ChromeCOMPUTER_USE_DENY_APPS
1Password, Keychain, System Settings, Activity Monitor, AIRICOMPUTER_USE_DENY_WINDOW_TITLESCOMPUTER_USE_TERMINAL_SHELL
/bin/zshCOMPUTER_USE_ALLOWED_BOUNDS
Browser DOM bridge:
COMPUTER_USE_BROWSER_DOM_BRIDGE_ENABLED
trueCOMPUTER_USE_BROWSER_DOM_BRIDGE_HOST
127.0.0.1COMPUTER_USE_BROWSER_DOM_BRIDGE_PORT
8765COMPUTER_USE_BROWSER_DOM_BRIDGE_TIMEOUT_MS
10000Autonomous browser agent:
COMPUTER_USE_BROWSER_AGENT_ROOT
src/bin/computer_useCOMPUTER_USE_PYTHON
browser_agent_run; defaults to the embedded .venv/bin/python when present, otherwise python3Legacy remote runner:
COMPUTER_USE_REMOTE_SSH_HOSTCOMPUTER_USE_REMOTE_SSH_USERCOMPUTER_USE_REMOTE_SSH_PORTCOMPUTER_USE_REMOTE_RUNNER_COMMANDCOMPUTER_USE_REMOTE_DISPLAY_SIZECOMPUTER_USE_REMOTE_OBSERVATION_BASE_URLCOMPUTER_USE_REMOTE_OBSERVATION_SERVE_PORTCOMPUTER_USE_REMOTE_OBSERVATION_TOKENBinary overrides:
COMPUTER_USE_SWIFT_BINARYCOMPUTER_USE_OSASCRIPT_BINARYCOMPUTER_USE_SCREENSHOT_BINARYCOMPUTER_USE_OPEN_BINARYCOMPUTER_USE_SSH_BINARYCOMPUTER_USE_TAR_BINARYAIRI still connects through mcp.json.
Example local macOS entry:
{
"mcpServers": {
"computer_use": {
"command": "pnpm",
"args": [
"-F",
"@proj-airi/computer-use-mcp",
"start"
],
"cwd": "/path/to/your/airi/repo",
"env": {
"COMPUTER_USE_EXECUTOR": "macos-local",
"COMPUTER_USE_APPROVAL_MODE": "actions",
"COMPUTER_USE_OPENABLE_APPS": "Terminal,Cursor,Google Chrome"
}
}
}
}
On the AIRI desktop side, approvals are handled like this:
computer_use::* toolapproval_requireddesktop_approve_pending_action or desktop_reject_pending_actionFor browser DOM automation, computer-use-mcp also exposes a local WebSocket bridge that matches the user's Chrome extension bridge pattern:
computer-use-mcp listens on ws://127.0.0.1:8765 by defaultbrowser_dom_* MCP tools against the active browser tabIf you override COMPUTER_USE_BROWSER_DOM_BRIDGE_HOST or
COMPUTER_USE_BROWSER_DOM_BRIDGE_PORT, mirror the same endpoint in the Chrome
extension via chrome.storage.local.set({ browserDomBridgeHost, browserDomBridgePort })
so the background worker reconnects to the correct socket.
Use the two surfaces differently:
desktop_* for AIRI itself, native macOS apps, Electron windows, Finder, Terminal, VS Codebrowser_dom_* for real browser pages, cross-frame DOM reads, form filling, selector-based interaction, and iframe-heavy flowsbrowser_agent_run for goal-driven browser tasks where AIRI should delegate the web exploration loop instead of manually hard-coding each browser steppnpm -F @proj-airi/computer-use-mcp typecheckpnpm -F @proj-airi/computer-use-mcp testpnpm -F @proj-airi/computer-use-mcp smoke:stdiopnpm -F @proj-airi/computer-use-mcp smoke:macospnpm -F @proj-airi/computer-use-mcp e2e:airi-chatpnpm -F @proj-airi/computer-use-mcp e2e:airi-discordLegacy remote validation remains available:
pnpm -F @proj-airi/computer-use-mcp bootstrap:remotepnpm -F @proj-airi/computer-use-mcp smoke:remoteIf you want to record a convincing demo, show the system as an orchestrated task runner instead of a flashy cursor dance.
Recommended recording structure:
computer-use-mcp service.report.json, audit.jsonl, or screenshots so the demo finishes with evidence rather than just screen motion.Good first demos:
pwd, inspect local changes, and run pnpm typecheckterminal_execbrowser_dom_* only when the task truly moves into a browser pageFor a management-readable AIRI demo, the Discord settings flow is more representative than a generic hello-world reply:
services/discord-bot/settings/modules/messaging-discordreport.json, screenshots, audit log, and discord-bot.logNotes:
AIRI_E2E_DISCORD_TOKENAIRI_E2E_DISCORD_TOKEN_SOURCE=portal or auto and let AIRI retrieve the token from the live browser / Discord Developer Portal sessionclipboard_read_text / clipboard_write_text are the intended bridge when AIRI must move a copied token from the browser back into AIRI settingsAIRI_E2E_DISCORD_ALLOW_LOGIN_FAILURE=trueLess convincing demos: