docs/agent-browser-port-spec.md
Last updated: February 13, 2026
Source inventory snapshot: vercel-labs/agent-browser @ 03a8cb9
This document tracks implemented behavior and remaining parity gaps for the cmux browser port.
agent-browser command surface (where meaningful for WKWebView).surface_id identity.As of February 12, 2026:
./scripts/run-tests-v1.sh passes on cmux-vm../scripts/run-tests-v2.sh passes on cmux-vm.test_browser_api_comprehensive.py, test_browser_api_p0.py, test_browser_api_extended_families.py, test_browser_api_unsupported_matrix.py, and test_browser_cli_agent_port.py.tests/test_visual_screenshots.py and tests_v2/test_visual_screenshots.py both report D12 (Nested: Close Top of T-shape) as a known non-blocking VM failure when it reproduces (VIEW_DETACHED).window: native macOS window.workspace: sidebar entry within a window (often called "tab" in UI).pane: split region inside a workspace.surface: tab within a pane (terminal or browser). This is the primary automation target.panel: internal implementation term; CLI/API should prefer surface.Terminology decision:
surface and pane.--panel as compatibility alias in CLI until v1 is retired.system.identify is the canonical "where am I?" call for agents and should remain first-class.
Required response fields for agent workflows:
focused.window_idfocused.workspace_idfocused.pane_idfocused.surface_idcaller validation result when caller context is suppliedRecommended extension for browser workflows:
focused.surface_typefocused.browser.urlfocused.browser.titlefocused.browser.loadingcli/src/commands.rs)open|goto|navigatebackforwardreloadclickdblclickfilltypehoverfocuscheckuncheckselectdraguploaddownloadpress|keykeydownkeyupscrollscrollintoview|scrollintowaitscreenshotpdfsnapshotevalclose|quit|exitconnectgetisfindmousesetnetworkstoragecookiestabwindowframedialogtracerecordconsoleerrorshighlightstatetapswipedeviceget: text|html|value|attr|url|title|count|box|stylesis: visible|enabled|checkedfind: role|text|label|placeholder|alt|title|testid|first|last|nthmouse: move|down|up|wheelset: viewport|device|geo|geolocation|offline|headers|credentials|auth|medianetwork: route|unroute|requestsstorage: local|session + get|set|clearcookies: default get, plus set|cleartab: default list, plus new|list|close|<index>window: newframe: <selector>|maindialog: accept|dismisstrace: start|stoprecord: start|stop|restartstate: save|loaddevice: list--json--full|-f--headed--debug--session--headers--executable-path--extension (repeatable)--cdp--profile--state--proxy--proxy-bypass--args--user-agent-p|--provider--ignore-https-errors--allow-file-access--devicesrc/protocol.tsCounts:
Protocol-only action names:
addinitscriptaddscriptaddstylebringtofrontclearclipboardcontentdispatchevalhandleexposehar_starthar_stopinnertextinput_keyboardinput_mouseinput_touchinserttextkeyboardlocalemultiselectpausepermissionsresponsebodyscreencast_startscreencast_stopselectallsetcontentsetvaluetimezoneuseragentvideo_startvideo_stopsystem.pingsystem.capabilitiessystem.identifywindow.list|current|focus|create|closeworkspace.list|create|select|current|close|move_to_windowpane.list|focus|surfaces|createsurface.list|focus|split|create|close|drag_to_split|refresh|health|send_text|send_key|trigger_flashbrowser.open_split|navigate|back|forward|reload|url.get|focus_webview|is_webview_focusedP0 (core parity for daily automation):
browser.snapshotbrowser.evalbrowser.waitbrowser.clickbrowser.dblclickbrowser.typebrowser.fillbrowser.press|keydown|keyupbrowser.hover|focusbrowser.check|uncheckbrowser.selectbrowser.scroll|scroll_into_viewbrowser.get.* (url|title|text|html|value|attr|count|box|styles)browser.is.* (visible|enabled|checked)browser.screenshotbrowser.focus_webview and browser.is_webview_focused (already present, keep)P1 (important but not blocking initial parity):
browser.find.* locators (role|text|label|placeholder|alt|title|testid|nth|first|last)browser.frame.selectbrowser.frame.mainbrowser.dialog.respondbrowser.download.waitbrowser.tab.* compatibility aliases mapped to cmux surfacesbrowser.console.listbrowser.errors.listbrowser.highlightbrowser.state.save|load (browser state in cmux context)P2 (advanced parity / optional):
route|unroute|requests|responsebody)viewport|media|offline|geolocation|permissions|headers|credentials|useragent|locale|timezone|device)addinitscript|addscript|addstyle|dispatch|expose|evalhandle)input_mouse|input_keyboard|input_touch)window_id, workspace_id, pane_id, surface_id@e1) are session-local and ephemeralsurface_idindex for debugging/order, but requests should accept IDsPrimary form:
cmux browser --surface <surface-id> <agent-browser-style-command...>
Shorthand:
cmux browser <surface-id> <agent-browser-style-command...>
Agent discovery:
cmux identify
cmux capabilities
cmux browser identify --surface <surface-id> # wrapper over system.identify + browser fields
Flash:
cmux trigger-flash [--workspace <id>] [--surface <id>]
Compatibility:
--panel as alias for --surface during migration.Required capabilities:
Proposed methods:
surface.move with surface_id + destination (pane_id or workspace_id/window_id) + placement (before_surface_id|after_surface_id|start|end)surface.reorder with surface_id + sibling anchor (before_surface_id|after_surface_id)workspace.reorder with workspace_id + anchor (before_workspace_id|after_workspace_id)Hard invariant:
surface_id must remain unchanged after all move/reorder operations.browser.* methods.invalid_params, not_found, invalid_state).browser command group in CLI/cmux.swift that accepts agent-browser-style command grammar.--surface mandatory targeting (with fallback from system.identify when explicitly desired).window/pane/workspace/surface (window:N, workspace:N, pane:N, surface:N).--id-format refs|uuids|both across relevant CLI commands (--json default refs, plain-text default refs).browser.snapshot (with refs).browser.eval.browser.wait variants: selector, timeout, URL pattern, load state, function, text.click, dblclick, hover, focus.type, fill, press, keydown, keyup.check, uncheck, select.scroll, scroll_into_view.browser.find.role.browser.find.text.browser.find.label.browser.find.placeholder.browser.find.alt.browser.find.title.browser.find.testid.browser.find.nth|first|last.frame.select, frame.main).accept, dismiss, optional prompt text).surface.move with handle-based destination rules.surface.reorder within pane.workspace.reorder.move-surface, reorder-surface, reorder-workspace).window_id/workspace_id/pane_id/surface_id.surface_id stability.WKWebView; implement supported subset.WKWebView.not_supported.docs/v2-api-migration.md with browser parity status.docs-site.agent-browsersrc/browser.test.ts -> ported/adapted into:
tests_v2/test_browser_api_p0.pytests_v2/test_browser_api_comprehensive.pytests_v2/test_browser_api_unsupported_matrix.pysrc/actions.test.ts -> adapted negative coverage in tests_v2/test_browser_api_comprehensive.py (invalid_params, not_found, timeout).src/protocol.test.ts -> adapted browser command/shape validation in tests_v2/test_browser_api_unsupported_matrix.py and existing CLI/cmux.swift command grammar checks.test/file-access.test.ts and test/launch-options.test.ts -> partially applicable to WKWebView; currently tracked as follow-up parity work (not blocking current browser method coverage).src/daemon.test.ts, src/stream-server.test.ts, test/serverless.test.ts, src/ios-manager.test.ts -> out-of-scope for cmux browser parity (different transport/runtime).tests_v2/test_browser_api_p0.pytests_v2/test_browser_api_comprehensive.pytests_v2/test_browser_api_unsupported_matrix.pytests_v2/test_browser_goto_split.pytests_v2/test_browser_panel_stability.pytests_v2/test_browser_custom_keybinds.pysurface_id stability pre/post operation.focused and caller fields.tests_v2/ pass.Planned verification commands at implementation completion:
ssh cmux-vm 'cd /Users/cmux/GhosttyTabs && ./scripts/run-tests-v2.sh'ssh cmux-vm 'cd /Users/cmux/GhosttyTabs && ./scripts/run-tests-v1.sh'cmux browser tab ... maps to browser surface tabs only (no separate workspace-level tab meaning inside browser namespace).browser.snapshot and selector not_found diagnostics intentionally mirror agent-browser semantics for agent usability.surface:N, pane:N, workspace:N, window:N.--id-format uuids|both).--id-format refs|uuids|both for output shaping.browser fill accepts empty text and treats it as a clear operation.snapshot_after (--snapshot-after in CLI), returning post_action_snapshot (+ refs/title/url).new-pane/new-surface plain output prefers short surface:N refs under default CLI ID formatting.not_supported errors vs best-effort fallback for commands that cannot be implemented on WKWebView with correct semantics.cmux browser or gate them behind a second rollout phase.