packages/cua-driver/rust/PARITY.md
All API surfaces (CLI subcommands, stdio MCP protocol, daemon lifecycle) are
covered by tests/integration/test_api_parity.py — 111 tests per binary,
222 total, parametrized so both binaries run the identical suite.
# Rust binary only
cd libs/cua-driver/rust/tests/integration
./run_tests.sh test_api_parity -v
# Both binaries side-by-side
./run_tests.sh --parity -v
| Gap | Rust | Swift |
|---|---|---|
type_text_chars | ✅ | missing |
get_accessibility_tree | ✅ | missing |
page tool | ✅ (cross-platform: Apple-Events on macOS, UIA+CDP on Windows, AT-SPI+CDP on Linux) | ✅ (macOS only) |
--version flag | ✅ | ✅ |
call check_permissions JSON | ✅ JSON | human-readable text |
call screenshot (no window_id) | ✅ full-display default | error (requires window_id) |
Tracking surface-by-surface line-by-line behavioral comparison between the Rust port and the Swift reference. Each entry lists Swift source location, Rust source location, divergences (intentional vs. accidental), and the deterministic test that locks the verified behavior in.
Format per entry:
## <surface>
- Swift: <path:line>
- Rust: macOS=<path:line>, windows=<path:line>, linux=<path:line>
- Status: VERIFIED | INTENTIONAL_DIVERGENCE | OPEN
- Test: <path>
- Notes: ...
move_cursorlibs/cua-driver/Sources/CuaDriverServer/Tools/MoveCursorTool.swift:6-60crates/platform-macos/src/tools/move_cursor.rscrates/platform-windows/src/tools/impl_.rs (MoveCursorTool)crates/platform-linux/src/tools/impl_.rs (MoveCursorTool)crates/platform-windows/examples/cursor_visibility.rsSwift's move_cursor calls CGWarpMouseCursorPosition — it warps the
real OS cursor instantly. The Rust port repurposes the same tool name
to drive the agent overlay (animated, non-warping arrow) instead.
Rationale: the entire premise of cua-driver-rs is background automation
that never steals focus and never moves the user's physical mouse. Porting
the Swift cursor-warp behavior would directly violate Swift's own
focus-guard / no-cursor-warp invariants enforced elsewhere (click,
type_text, etc.). The Rust port treats move_cursor as "show the agent's
attention" — visual only.
Consequences:
cursor_id: string (multi-cursor support
doesn't exist in Swift).number) for x/y; Swift only accepts integers.
Rust's looser type accepts every Swift-valid integer plus fractional pixel
targets used by HiDPI flows.Agent cursor '<id>' moved to (X.X, Y.Y). instead of
✅ Moved cursor to (X, Y). — the Swift wording would be misleading
given the different semantics.All three Rust platforms now send:
OverlayCommand::MoveTo { x, y, end_heading_radians: FRAC_PI_4 }
FRAC_PI_4 (π/4) matches Swift AgentCursor.animateAndWait(endAngleDegrees: 45) so the overlay arrow always settles pointing upper-left. Linux was
previously sending 0.0 (left-pointing); fixed in this commit.
The deterministic test (cursor_visibility.rs) drives the live daemon via
the named pipe, sets a magenta gradient, sends move_cursor, and polls
screenshots until the cursor centroid settles. Asserts the final centroid
is within 100 px of the requested target and that ≥50 magenta pixels are
rendered. Hard 4 s timeout. Verified on Windows; should run on macOS/Linux
once the daemon pipe is exposed there (macOS uses Unix socket, Linux uses
Unix socket).
get_cursor_positionlibs/cua-driver/Sources/CuaDriverServer/Tools/GetCursorPositionTool.swift:6-37crates/platform-macos/src/tools/get_cursor_position.rscrates/platform-windows/src/tools/impl_.rs (GetCursorPositionTool)crates/platform-linux/src/tools/impl_.rs (GetCursorPositionTool)crates/platform-windows/examples/get_cursor_position_parity.rs"✅ Cursor at (X, Y)"; Rust on
every platform was returning "Cursor: (X, Y)" (no checkmark, wrong
word). All three platforms now match Swift exactly.Int(pos.x); macOS Rust was
returning floats formatted "({x:.1}, {y:.1})". Now truncates to
integers like Swift, consistent with Windows/Linux Rust."Return the current mouse cursor position in screen points (origin top-left).".structuredContent: { x: int, y: int } is included alongside the text
response. Swift returns text only. This is a backwards-compatible MCP
enrichment — tools that read structured content get integers; tools
that read text get Swift's exact format. The test asserts both views
agree and both agree with the platform's native GetCursorPos call
within ±5 px.| Platform | Swift | Rust |
|---|---|---|
| macOS | CGEvent(source: nil).location | CGEvent::new(CGEventSource(HIDSystemState)).location() |
| Windows | n/a | GetCursorPos (Win32) |
| Linux | n/a | xproto::query_pointer on the root window |
All three return screen-coordinate space, top-left origin, matching the documented Swift behavior.
get_screen_sizelibs/cua-driver/Sources/CuaDriverServer/Tools/GetScreenSizeTool.swift:6-46
+ libs/cua-driver/Sources/CuaDriverCore/Capture/ScreenInfo.swiftcrates/platform-macos/src/tools/get_screen_size.rscrates/platform-windows/src/tools/impl_.rs (GetScreenSizeTool)crates/platform-linux/src/tools/impl_.rs (GetScreenSizeTool)crates/platform-windows/examples/get_screen_size_parity.rsscale_factor — Swift's whole reason for this tool is to
expose the backing scale factor (Retina = 2.0). All three Rust
platforms were dropping it. Now all three return it:
NSScreen.mainScreen.backingScaleFactor via objc2-app-kit.GetDpiForSystem() / 96.0.1.0 (X11 has no per-monitor scale; recorded as a
known limitation rather than silently dropping the field)."Screen: W×H" (no checkmark, no scale, ×
unicode); now "✅ Main display: WxH points @ Sx" matching Swift.ToolResult::error("No main display detected.") when NSScreen.mainScreen is nil, matching Swift's
isError: true response.structuredContent.scale_factor key is snake_case to match
Swift's ScreenSize.CodingKeys.scaleFactor = "scale_factor".get_screen_size_parity.exe round-trips the pipe and asserts:
"✅ Main display: 1680x1050 points @ 1x" exactly.structuredContent.width / height / scale_factor agree with the text.GetSystemMetrics + GetDpiForSystem /96.structuredContent: { width, height, scale_factor } is included alongside
the text response. Swift returns text only. Same rationale as
get_cursor_position — backwards-compatible MCP enrichment, text format
still matches Swift exactly.
check_permissionslibs/cua-driver/Sources/CuaDriverServer/Tools/CheckPermissionsTool.swift:6-59
+ libs/cua-driver/Sources/CuaDriverCore/Permissions/Permissions.swiftcrates/platform-macos/src/tools/check_permissions.rs (FIXED, pending macOS run)crates/platform-windows/src/tools/impl_.rs (CheckPermissionsTool)crates/platform-linux/src/tools/impl_.rs (CheckPermissionsTool)prompt: boolean parameter matching Swift's
inputSchema.properties.prompt.prompt: true and raises
the macOS Accessibility + Screen Recording TCC prompts via
AXIsProcessTrustedWithOptions({"AXTrustedCheckOptionPrompt": true})
and CGRequestScreenCaptureAccess(). Rust previously never prompted.
Both APIs are now wired up via link(name = "ApplicationServices"|"CoreGraphics")."Accessibility API: ✅ granted\nScreen Recording: ✅ granted";
now matches Swift exactly: "✅ Accessibility: granted.\n✅ Screen Recording: granted.".read_only: false (was true) matching Swift's
readOnlyHint: false — the default path can mutate state by raising a dialog.SCShareableContent.excludingDesktopWindows
(ScreenCaptureKit), which is hard to call from Rust without large bindings.
Approximation: CGPreflightScreenCaptureAccess() (preflight C API) with
fallback to the existing window-enumeration heuristic. May report
false negatives for some subprocess-launched apps (same caveat that
prompted Swift to switch to SCShareableContent). Documented as a
known approximation.check_permissions on macOS is fundamentally about TCC (Apple's per-app
permission database). Neither Windows nor Linux have an equivalent. The
Rust ports retain the tool name and the read-only-status structure, but
return platform-specific content:
These divergences are intentional and not fixable without making the tool no-op on Windows/Linux (worse UX than returning the platform-specific status).
serve)libs/cua-driver/Sources/CuaDriverCore/Permissions/PermissionsGate.swift
(SwiftUI panel + AppKit window + 1 Hz polling)crates/platform-macos/src/permissions/gate.rs
(CLI banner + auto-open System Settings + 1 Hz polling)--no-permissions-gate
flag is accepted and silently ignored on those platforms for CLI uniformity.Swift surfaces a branded SwiftUI window on first launch. The Rust port ships a terminal-driven banner instead. Rationale:
cua-driver serve from a shell (Claude Code, Cursor, Codex), which
already has a terminal attached.--no-permissions-gate + CUA_DRIVER_RS_PERMISSIONS_GATE=0 is the
straight-line approach. Replicating Swift's window only to suppress
it under headless would be more code with no UX upside.The CLI gate still preserves the substantive Swift behaviours:
x-apple.systempreferences: URLs at once so the user can
grant both in a single Settings visit. (Swift's "chain to next pane
when one flips green" trick is unnecessary when both panes are
pre-opened.)Timer.| Signal | Effect |
|---|---|
--no-permissions-gate flag | gate skipped |
CUA_DRIVER_RS_PERMISSIONS_GATE=0 | gate skipped |
CUA_DRIVER_RS_PERMISSIONS_GATE=false | gate skipped |
CUA_DRIVER_RS_PERMISSIONS_GATE=no | gate skipped |
CUA_DRIVER_RS_PERMISSIONS_GATE=off | gate skipped |
| any other env value | gate active |
Default deadline is 10 minutes; on timeout the gate logs an error and
serve continues to start, mirroring the Swift "user closed the
panel" path (individual tool calls then fail with the underlying TCC
error).
A native NSAlert via objc2 is tracked as a follow-up if the
terminal-only flow proves insufficient; the CLI is the MVP.
list_appslibs/cua-driver/Sources/CuaDriverServer/Tools/ListAppsTool.swift:6-71
+ libs/cua-driver/Sources/CuaDriverCore/Apps/AppInfo.swiftcrates/platform-macos/src/tools/list_apps.rs + apps.rs:format_app_listcrates/platform-windows/src/tools/impl_.rs (ListAppsTool)
+ crates/platform-windows/src/win32/installed_apps.rscrates/platform-linux/src/tools/impl_.rs (ListAppsTool)
+ crates/platform-linux/src/installed_apps.rspid, name, bundle_id, kind,
launch_path, last_used, windows, running, active).
Validated against list_apps_parity example.tests/integration/test_api_parity.py::RustParityTests::test_call_list_apps_*
+ crates/platform-windows/examples/list_apps_parity.rsSingle flat array. Every entry carries the same fields on every
platform; values that don't apply to a given platform are null.
{
"apps": [
{
"pid": 47291, // 0 when running=false
"name": "Visual Studio Code",
"bundle_id": "com.microsoft.VSCode", // macOS bundle id, Win32 .exe path, or Linux desktop file id
"running": true,
"active": false, // true for the system-frontmost app (only one at a time)
"kind": "desktop", // "desktop" | "uwp" | null
"launch_path": "/Applications/Visual Studio Code.app",
"last_used": "2026-05-15T12:34:56Z", // RFC3339, null if unreadable
"windows": [] // reserved — kept cheap; query list_windows for per-window state
}
]
}
NSWorkspace (via the existing
AppleScript bridge), installed set from a filesystem scan of
/Applications, /Applications/Utilities, /System/Applications,
/System/Applications/Utilities, ~/Applications. Bundle metadata
read from each .app/Contents/Info.plist via plutil. Merged by
bundle id; launch_path and last_used (bundle mtime) are
backfilled onto running entries when the bundle id matches.EnumWindows → owner pids) + CreateToolhelp32Snapshot for the
pid→exe table. Installed set is the union of Start-Menu
.lnk shortcuts (resolved via IShellLinkW::GetPath) and WinRT
Management::Deployment::PackageManager::FindPackagesWithPackageTypes(Main).
Merged by exe basename. UWP entries carry
launch_path = "shell:appsFolder\\{PackageFamilyName}!App"./proc/<pid>/status + cmdline.
Installed set from XDG .desktop files in $XDG_DATA_HOME/applications
and each $XDG_DATA_DIRS entry's applications/ subdir. Entries
with NoDisplay=true, Hidden=true, or Type!=Application are
filtered. Merged by exe basename (after stripping env-style
prefixes and Exec= field codes).The unified shape is additive. pid, name, bundle_id,
running, active keep their pre-change positions and types.
The new keys (launch_path, kind, last_used, windows) are
new. The legacy processes key (Linux/Windows) remains as a thin
alias over the running subset so older callers that read the old
running-only shape keep working.
macOS (verified live):
cargo build --release -p cua-driver
./target/release/cua-driver call list_apps | python3 -c "
import json, sys
d = json.load(sys.stdin)
running = [a for a in d['apps'] if a['running']]
installed = [a for a in d['apps'] if not a['running']]
print(f'{len(d[\"apps\"])} total: {len(running)} running, {len(installed)} installed-only')
for a in d['apps'][:1]:
for k in ('pid','name','bundle_id','running','active','kind','launch_path','last_used','windows'):
assert k in a, f'missing field {k!r}'
print('shape OK')
"
Windows (cross-target check on macOS host succeeds; live run on a Windows host):
cargo build --release -p cua-driver
.\target\release\cua-driver.exe call list_apps | ConvertFrom-Json | ForEach-Object {
$_.apps | Group-Object kind | Format-Table Name, Count
}
Expect at least one desktop group (Start-Menu hits) and on Win10+
at least one uwp group (WinRT packages). Every entry should have
a launch_path that's either an absolute .exe path or
shell:appsFolder\....
Linux:
cargo build --release -p cua-driver
./target/release/cua-driver call list_apps | jq '.apps | group_by(.running) | map({running: .[0].running, n: length})'
Expect two groups: one with running: true (live processes that
matched a .desktop launcher), one with running: false (the
installed-but-not-running tail).
list_windowslibs/cua-driver/Sources/CuaDriverServer/Tools/ListWindowsTool.swift:6-245crates/platform-macos/src/tools/list_windows.rs (TBD audit)crates/platform-windows/src/tools/impl_.rs (ListWindowsTool)crates/platform-linux/src/tools/impl_.rs (ListWindowsTool, blocked behind Linux compile fix)crates/platform-windows/examples/list_windows_parity.rscrate::win32::list_windows walks EnumWindows first — that's the Win32
window manager's canonical top-to-bottom z-order, so it's the authoritative
source for both window membership and ordering. It then asks UI Automation
for any top-level windows EnumWindows missed
(AutomationElement::RootElement.FindAll(TreeScope::Children, TrueCondition),
filtered to IsOffscreen == false) and appends those at the end of the merged
list, deduped by HWND. UIA elements contribute their NativeWindowHandle as
the canonical HWND so downstream code keyed on (pid, window_id) keeps working
unchanged.
Listability filter (shared). Both sources gate each HWND through the same
crate::win32::windows::is_listable_top_level predicate — visible
(IsWindowVisible), not minimized (!IsIconic), owner-less (GW_OWNER null,
so no tool-tips / owned pop-ups) and not DWM-cloaked (DWMWA_CLOAKED == 0, so
no suspended-UWP background frames). The window title is read for display
only and is not a filter: empty-caption top-level windows (WPF
HwndWrapper[App.exe;;<guid>], borderless / custom-chrome apps) are listed.
This fixes trycua/cua#2020, where a non-empty-title check at both sources hid
such windows from the agent even though debug_window_info could still see
them.
Why UIA at all: modern apps (WebView2-hosted Notepad, packaged-UWP frames,
some Electron containers) sometimes hide their visible window inside a host
HWND that EnumWindows either skips or surfaces with the wrong title/bounds.
UIA's desktop-children walk surfaces the real interactable window — but only
when EnumWindows didn't.
Why EnumWindows-first (not UIA-first): UIA's FindAll(TreeScope::Children)
gives no z-order guarantee; using it as the primary source would produce
non-deterministic ordering. EnumWindows order IS the Win32 z-order.
filter_pid is applied to the merged list, so a UWP app's pid that
previously returned empty now returns its real window.
Swift returns per-record: {window_id, pid, app_name, title, bounds {x,y,width,height}, layer, z_index, is_on_screen, on_current_space?, space_ids?}
plus top-level {windows, current_space_id}. Windows Rust was returning
a flat {window_id, pid, title, x, y, width, height}. Now matches Swift:
app_name — populated by joining against the process table
(crate::win32::list_processes).bounds: {x, y, width, height} — was flat x, y, width,
height siblings of title.layer: 0 — Swift filters to layer-0 (normal windows); Windows
has no layer concept so hard-coded 0.z_index — derived from EnumWindows order (top-to-bottom),
inverted so higher = closer to front per Swift convention.is_on_screen: true — currently always true because the Win32
list_windows source filter only returns IsWindowVisible && !IsIconic
windows. See limitation below.current_space_id: null — Windows has no Spaces.on_current_space / space_ids omitted — matching Swift's
else-branch when SkyLight SPIs are unavailable; the text header
explicitly says so."✅ Found N window(s) across M app(s); X on-screen. (SkyLight Space SPIs unavailable — ...)".
Per-record line: "- {app_name} (pid {pid}) {"title"|(no title)} [window_id: {id}]{[off-screen]?}"."⚠️ No windows found for pid X. ..." with a hint
about the current frontmost app, matching Swift's warning.structuredContent._legacy_windows keeps the old flat shape (no app_name,
flat x/y/width/height) for any pre-existing callers; remove once they migrate.
Windows Rust does not yet enumerate off-screen / minimized windows.
Swift's default (on_screen_only: false) returns them; Windows currently
filters them out at both enumeration sources (UIA's IsOffscreen flag and
EnumWindows's IsWindowVisible && !IsIconic callback). The
on_screen_only schema field is accepted but has no effect. Follow-up to
refactor list_windows to return everything and filter at the tool layer.
clicklibs/cua-driver/Sources/CuaDriverServer/Tools/ClickTool.swift:29-595crates/platform-macos/src/tools/click.rs (focus-suppression wrap VERIFIED; full audit pending)crates/platform-windows/src/tools/impl_.rs (ClickTool)crates/platform-linux/src/tools/impl_.rs (ClickTool)crates/platform-windows/examples/click_parity.rs"✅ Clicked at (X.X, Y.Y) × N.";
now "✅ Posted click to pid X.", "✅ Posted double-click to pid X.",
"✅ Posted triple-click to pid X." matching Swift's performPixelClick."✅ Clicked element [N] at screen (X,Y).";
now "✅ Performed {action} on [N] (screen (X,Y))." matching Swift's
"✅ Performed AXPress on [N] {role} \"{title}\"." shape (UIA has no
readily-available role/name on the cached element, so we emit
element_index + screen coords for traceability; future work: populate
the UIA cache with name/control-type and include them)."Provide element_index or (x, y) to address the click target." — Windows previously said
"Provide element_index or (x + y). pid is always required." Now
matches Swift verbatim.Swift's click takes {action: enum, modifier: [string], debug_image_out: string};
Windows's click takes {button: enum} instead. Rationale:
button: left|right|middle — Windows convenience. Swift exposes
right-click as a separate right_click tool (which we also have as
right_click), so button: right overlaps that tool. Windows
keeps both shapes; the standalone tool matches Swift's per-tool
decomposition while button gives single-call flexibility.action: enum — AX-specific (AXPress / AXShowMenu / AXPick /
AXConfirm / AXCancel / AXOpen). UIA has no clean 1:1 mapping (Invoke
pattern handles most cases; ShowMenu doesn't exist as a UIA pattern).
Not exposed on Windows yet; a future Windows port could map a subset
to UIA patterns (Invoke ≈ press, but show_menu has no analogue).modifier: [string] — not yet implemented on Windows; UIA's
background-click via PostMessage doesn't propagate modifier-key state
cleanly (would require synthesizing WM_KEYDOWN(VK_CONTROL) first).
Follow-up.debug_image_out — also follow-up.click_parity.exe against Chrome (pid 62156, window_id 4464038):
"Provide element_index or (x, y) to address the click target." ✓"✅ Posted click to pid 62156." ✓count: 2): "✅ Posted double-click to pid 62156." ✓launch_applibs/cua-driver/Sources/CuaDriverServer/Tools/LaunchAppTool.swift:6-490crates/platform-windows/src/tools/impl_.rs (LaunchAppTool)crates/platform-macos/src/tools/launch_app.rs (full focus-steal contract)crates/platform-linux/src/tools/impl_.rs (TBD audit)ShellExecuteExW + SW_SHOWNOACTIVATE (no focus steal, matches
macOS oapp) AND now schedule a best-effort polling
GetForegroundWindow/SetForegroundWindow restore (≤3s, 100ms cadence)
that flips the user's prior foreground back if the spawned app
activates. URLs-only invocations skip the restore (the user explicitly
asked the default browser to come up with that page). UWP launches use
IApplicationActivationManager + best-effort GetForegroundWindow
snapshot/restore (best-effort because SetForegroundWindow is subject
to Windows' foreground-lock restrictions — visual confirmation in
Session 1+ recommended).crates/platform-windows/examples/launch_app_parity.rs (accepts
Session-0 fast-fail for UWP path the same way it accepts "not installed")tests/integration/test_focus_steal_parity.py + crates/platform-macos/src/focus_steal.rs (Rust unit tests)pid capture — was using ShellExecuteW which returns only an
HINSTANCE that's useless for pid lookup. Now uses
ShellExecuteExW with SEE_MASK_NOCLOSEPROCESS so we read the
spawned process handle and call GetProcessId → real pid in the
response, matching Swift's AppLauncher.launch.info.pid.
Structured response — was missing entirely. Now returns
{pid, bundle_id, name, running, active, windows} matching Swift's
LaunchResult shape exactly. bundle_id is null on Windows.
Text format — was "Launched 'X' (no focus steal)."; now
"✅ Launched <name> (pid <N>) in background." + a Windows: block
listing per-window "- <title|(no title)> [window_id: ID]" lines
and a → Call get_window_state(...) hint, matching Swift verbatim.
bundle_id parameter — accepted as an alias for name (Windows
has no bundle-identifier concept), with one extra meaning on Win11:
when the value matches an AUMID ({PackageFamilyName}!{ApplicationId},
contains !), launch_app routes through
IApplicationActivationManager::ActivateApplication and returns
the real packaged-app pid. Without this routing, Win11 launches of
built-in apps (Notepad, Calculator, Paint, …) return the pid of
the ~7 KB System32 stub that exits within milliseconds — useless
for list_windows, get_window_state, etc.
4a. name → packaged-app lookup (Win11) — name is now first
resolved against shell:AppsFolder (the Start Menu's "all apps"
index, cached for the lifetime of the driver process). On a hit
the lookup yields an AUMID and goes through the packaged path
(see #4); on a miss it falls back to ShellExecuteExW's PATH
search. .exe suffix is stripped before matching so "notepad"
and "notepad.exe" both resolve to the packaged Notepad.
4b. aumid parameter — optional explicit AUMID; cleaner than
overloading bundle_id when the caller has the AUMID in hand.
Takes precedence over bundle_id/name.
additional_arguments — honored, passed as lpParameters to
ShellExecuteExW (Win32 path) or as the activation arguments
string to ActivateApplication (packaged path).
Window-resolution retry — ports Swift's 5-attempt 100/200ms
retry to absorb Win32 window-creation lag after ShellExecuteEx
returns.
Error wording — "Provide either bundle_id or name to identify the app to launch." matches Swift's errorResult text.
active: false — hardcoded; SW_SHOWNOACTIVATE is the
Windows-equivalent of Swift's background-launch invariant.
Description — multi-paragraph port from Swift with explicit Windows-specific notes (path takes precedence; bundle_id alias).
Polling foreground-restore for the Win32 path — mirrors the macOS
FocusRestoreGuard. LaunchAppTool::invoke captures
GetForegroundWindow() before the launch dispatch and, for the
ShellExecuteExW branch, spawns a tokio task that polls every 100ms
(up to 3s) for "the spawned app actually grabbed foreground" and then
flips the prior HWND back via SetForegroundWindow. The UWP/AUMID
branch keeps its existing synchronous restore in
launch_uwp::restore_foreground_best_effort — the polling task is
gated on the Win32 branch to avoid double-restoring.
URLs-only invocations ({urls: [...]} with no app-identifying field)
skip the restore: the user asked for that page to come up in the
default browser, restoring would hide it. The decision is a pure
function (should_restore_foreground_after_launch) with unit
coverage in launch_focus_restore_decision_tests.
SetForegroundWindow from non-UIAccess processes is restricted by
Windows' foreground-lock; failures are logged at tracing::trace!
and not surfaced — the launch itself already succeeded.
electron_debugging_port, webkit_inspector_port,
creates_new_application_instance — Swift-specific. Accepted in
the schema so cross-platform callers can pass them; currently
no-ops on Windows. Documented as follow-up.launch_app_parity.exe launches Notepad:
"✅ Launched notepad.exe (pid 30612) in background." ✓structuredContent.pid is the actual ShellExecuteEx pid (Win10)
or the real packaged-app pid (Win11, after AppsFolder lookup) ✓bundle_id: null on Win10 / unpackaged Notepad; AUMID string on
Win11 packaged Notepad ✓running: true, active: false ✓bundle_id="Microsoft.WindowsNotepad_8wekyb3d8bbwe!App"
and asserts bundle_id round-trips identically. ✓apps::launch_app, launch_app_by_name, and
launch_with_urls_* no longer Command::new("open"). All paths now
call apps::nsworkspace::open_application /
open_urls_with_application directly via objc2-app-kit's
NSWorkspace.openApplication(at:configuration:completionHandler:)
(no-URL) and open(_:withApplicationAt:configuration:) (URL-handoff)
variants. Matches Swift AppLauncher.swift:106-131 byte-for-byte in
the launch semantics.activates = false + addsToRecentItems = false — set on every
launch via NSWorkspaceOpenConfiguration. Mirrors Swift
AppLauncher.swift:65-66.oapp AppleEvent descriptor on the no-URL path — hand-rolled
extern_methods! block in apps/nsworkspace.rs binds
NSAppleEventDescriptor.init(eventClass:eventID:targetDescriptor:returnID:transactionID:)
(not exposed by objc2-foundation 0.2.2). Without this, cold-launches
of state-restored apps (Calculator-class) are windowless. Matches
Swift AppLauncher.swift:85-103.LaunchAppTool::invoke captures the
prior frontmost pid, arms a wildcard suppression entry, launches,
swaps to a targeted entry (with overlap, not drop-then-begin, to
avoid the hoang17 PR #1521 race), sleeps 500ms, drops the lease,
and belt-and-braces re-activates the prior frontmost if the target
is still on top. Matches Swift LaunchAppTool.swift:181-281.NSRunningApplication
returned by openApplication is used directly; no list_running_apps
scan-and-match race. Matches Swift.tests/integration/test_focus_steal_parity.py covers:
urls=["about:blank"] — frontmost preserved ✓oapp AppleEvent) ✓focus_steal::tests::deadline_reaps_leaked_entry)press_keylibs/cua-driver/Sources/CuaDriverServer/Tools/PressKeyTool.swift:20-202crates/platform-windows/src/tools/impl_.rs (PressKeyTool)crates/platform-macos/src/tools/press_key.rs (focus-suppression wrap VERIFIED; full audit pending)crates/platform-linux/src/tools/impl_.rs (TBD)crates/platform-windows/examples/press_key_parity.rs"Pressed key 'KEY'."; now matches Swift verbatim
"✅ Pressed KEY on pid X." (no quotes around key, pid included)."Missing required integer field pid." (was generic)"Missing required string field key." (was generic)"window_id is required when element_index is used — the element_index cache is scoped per (pid, window_id). Pass the same window_id you used in get_window_state." (was previously not validated; added the
same Swift guard).element_index field (accepted; currently no-op
on Windows since UIA SetFocus isn't wired up yet — documented).element_index is accepted but currently no-op (Swift focuses the
element via AXSetAttribute(kAXFocused, true) first; Windows UIA
IUIAutomationElement::SetFocus exists but is not yet wired up
here). Documented as a follow-up; using press_key without
element_index already works for scroll keys / shortcuts on the
already-focused element.press_key_parity.exe against Chrome:
"Missing required string field key." ✓"✅ Pressed end on pid 85676." ✓hotkeylibs/cua-driver/Sources/CuaDriverServer/Tools/HotkeyTool.swift:6-142crates/platform-windows/src/tools/impl_.rs (HotkeyTool)crates/platform-macos/src/tools/hotkey.rs (focus-suppression wrap VERIFIED; full audit pending)crates/platform-linux/src/tools/impl_.rs (TBD)crates/platform-windows/examples/hotkey_parity.rs"Pressed CTRL+C on pid X." (no checkmark,
uppercase mods only); now "✅ Pressed K1+K2+... on pid X."
matching Swift's keys.joined(separator: "+") of the full keys
array as the caller supplied it (preserves case and order)."Missing required integer field pid." (Swift's wording)."Missing required array field keys." (was
a Rust-specific "Provide 'keys' array..." fallback)."keys must be a non-empty array of strings.".hotkey_parity.exe against Chrome:
ctrl+end → "✅ Pressed ctrl+end on pid 85676." ✓double_clicklibs/cua-driver/Sources/CuaDriverServer/Tools/DoubleClickTool.swift:28-327crates/platform-windows/src/tools/impl_.rs (DoubleClickTool)crates/platform-windows/examples/double_click_parity.rs"✅ Double-clicked at (X.X, Y.Y).";
now "✅ Posted double-click to pid X at window-pixel (a, b) → screen-point (c, d)."
matching Swift verbatim."✅ Double-clicked element [N] at screen (X,Y).";
now "✅ Posted double-click to [N] at screen-point (X, Y)." matching
Swift's pixel-fallback element wording. (UIA role/title placeholder
pending element-cache enrichment.)"Provide both x and y together, not just one.""Provide either element_index or (x, y), not both.""Provide element_index or (x, y) to address the double-click target.""window_id is required when element_index is used — the element_index cache is scoped per (pid, window_id). Pass the same window_id you used in get_window_state.""Missing required integer field pid."
matches Swift (schema-layer catches this first; tool-layer fallback uses
Swift wording).modifier schema field — accepted for parity (no-op on Windows;
PostMessage doesn't propagate modifier-key state — same caveat as
click).double_click_parity.exe against Chrome:
"✅ Posted double-click to pid 85676 at window-pixel (300, 300) → screen-point (462, 456)." ✓right_clicklibs/cua-driver/Sources/CuaDriverServer/Tools/RightClickTool.swift:27-324crates/platform-windows/src/tools/impl_.rs (RightClickTool); macOS/linux OPENcrates/platform-windows/examples/right_click_parity.rs"✅ Right-clicked at (X.X, Y.Y)."; now
"✅ Posted right-click to pid X at window-pixel (a, b) → screen-point (c, d)."
matching Swift."✅ Right-clicked element [N] at screen (X,Y).";
now "✅ Shown menu for [N] (screen (X, Y))." matching Swift's
AXShowMenu text (UIA role/title placeholder pending element-cache enrichment).right_click_parity.exe against Chrome:
"✅ Posted right-click to pid 62156 at window-pixel (300, 300) → screen-point (301, 368)." ✓screenshotlibs/cua-driver/Sources/CuaDriverServer/Tools/ScreenshotTool.swift:5-170crates/platform-windows/src/tools/impl_.rs (ScreenshotTool); macOS/linux OPENcrates/platform-windows/examples/screenshot_parity.rs"Screenshot (window): WxH png."; now matches
Swift verbatim: "✅ Window screenshot — WxH png [window_id: ID]"
(em-dash, checkmark, window-id suffix). Display fallback uses
"✅ Display screenshot — WxH png" (Rust-only, see intentional below).idempotent — was true; Swift uses false (a fresh pixel grab
every call). Now matches Swift.max_image_dimension default — was 0 (no cap), Swift uses 1568.
Now 1568 on all 3 Rust platforms (matches
CuaDriverConfig.defaultMaxImageDimension). The 0 default was
producing 10MB screenshots on the Windows VM; 1568 caps the long
edge before encoding.window_id — Swift requires it; Windows allows omission
for whole-display capture (screenshot_display_bytes). Useful
Windows-only convenience that Swift can't easily provide because
macOS Screen Recording requires per-window grants. Schema accepts
both shapes; description explains.format — Swift defaults png; all 3 Rust platforms now
default jpeg. Rationale: agents typically want compact images for
vision-model context windows; PNG is lossless but multi-MB on screen
content. Schema still accepts both; callers wanting PNG pass
{"format":"png"}. Swift may follow; tracked as a follow-up parity
question.quality — Swift defaults 95; Rust defaults 85
(already Linux's default and the macOS Claude-Code-compat tool's
default). 85 is the typical sweet spot for screen content. Diverges
from Swift only when both sides actually emit JPEG.screenshot_parity.exe:
"✅ Window screenshot — 1087x644 png [window_id: 4464038]" ✓width, height, format ✓mimeType: image/jpeg, format: "jpeg" ✓scrolllibs/cua-driver/Sources/CuaDriverServer/Tools/ScrollTool.swift:23-211crates/platform-windows/src/tools/impl_.rs (ScrollTool); macOS/linux OPENcrates/platform-windows/examples/scroll_parity.rs"Scrolled DIR Nx GRAN."; now matches Swift
shape "✅ Scrolled pid X DIR via Nx SB_LINEDOWN message(s)." (Swift uses key
names; Rust uses Win32 SB_* constants since the actual transport is
WM_VSCROLL/WM_HSCROLL, not keystrokes — text reflects mechanism)."Missing required integer field pid." /
"Missing required string field direction." matches Swift.element_index — accepted; currently no-op on Windows (UIA
SetFocus not wired up).WM_VSCROLL/WM_HSCROLL messages with
SB_LINE*/SB_PAGE* codes. Swift uses synthesized keystrokes
(PageDown/arrow keys) via auth-signed SLEventPostToPid because
CGEventCreateScrollWheelEvent2 is dropped by Chromium on macOS.
Windows doesn't have that constraint; scroll-message events work
reliably background.scroll_parity.exe against Chrome:
"✅ Scrolled pid 62156 down via 2× SB_PAGEDOWN message(s)." ✓type_textlibs/cua-driver/Sources/CuaDriverServer/Tools/TypeTextTool.swift:13-225crates/platform-windows/src/tools/impl_.rs (TypeTextTool); macOS/linux OPENcrates/platform-windows/examples/type_text_parity.rs"Typed N character(s)."; now matches Swift's
CGEvent-fallback shape "✅ Typed N char(s) on pid X via PostMessage (Yms delay)."
(Swift says CGEvent; Windows says PostMessage since that's the
actual transport)."Missing required integer field pid." /
"Missing required string field text." matches Swift.delay_ms field added (0–200, default 30, matches Swift).AXSetAttribute(kAXSelectedText)
first for bulk insert, falls back to CGEvent when AX rejects. Windows
always takes the character-by-character path (no UIA equivalent of
AXSelectedText wired up yet). User-visible behavior matches Swift's
fallback path, just without the fast-path optimization.element_index accepted; no-op on Windows (no UIA SetFocus path).type_text_parity.exe against Chrome:
"✅ Typed 5 char(s) on pid 62156 via PostMessage (10ms delay)." ✓set_valuelibs/cua-driver/Sources/CuaDriverServer/Tools/SetValueTool.swift:8-336crates/platform-windows/src/tools/impl_.rs (SetValueTool); macOS/linux OPENcrates/platform-windows/examples/set_value_parity.rs"Set value of element N."; now matches Swift's
default-write shape "✅ Set AXValue on [N] (UIA ValuePattern)." (UIA
role/title placeholder pending element-cache enrichment).idempotent: true — was false; Swift uses true (setting same
value twice is idempotent).IUIAutomationSelectionPattern /
IUIAutomationSelectionItemPattern which would be the equivalent;
documented as a follow-up. For now set_value on a combo box
delegates to IUIAutomationValuePattern::SetValue which works on
most native ComboBoxes.set_value_parity.exe:
value + required ✓"Element 99999 not in cache." ✓get_config + set_configlibs/cua-driver/Sources/CuaDriverServer/Tools/GetConfigTool.swift:13-79
libs/cua-driver/Sources/CuaDriverServer/Tools/SetConfigTool.swift:25-167crates/platform-windows/src/tools/impl_.rs (GetConfigTool, SetConfigTool); macOS/linux OPENcrates/platform-windows/examples/config_parity.rs"cua-driver-rs configuration"; now matches
Swift's "✅ <pretty JSON>" format with schema_version, version,
platform, capture_mode, max_image_dimension.{key: <dotted-path>, value: <json>}
shape AND keeps the legacy per-field shape. Unknown keys return
"Unknown config key 'X'. Known: capture_mode, max_image_dimension."."Config updated: ..."; now echoes the
full updated config in the same pretty-JSON "✅ {...}" format as
get_config, matching Swift's "echo full config after write" pattern.agent_cursor.* subtree in the Windows config struct. Swift
exposes agent_cursor.enabled + agent_cursor.motion.* (start_handle,
end_handle, arc_size, arc_flow, spring) as persistent config; Windows
currently has only capture_mode and max_image_dimension. Cursor
config lives separately in cursor-overlay crate config and is set
via CLI flags, not this tool yet. Documented as a follow-up.config_parity.exe:
"✅ {<pretty JSON>}" with all keys ✓get_agent_cursor_statelibs/cua-driver/Sources/CuaDriverServer/Tools/GetAgentCursorStateTool.swift:9-68crates/platform-windows/src/tools/impl_.rs (GetAgentCursorStateTool); macOS/linux OPENcrates/platform-windows/examples/get_agent_cursor_state_parity.rs"N cursor instance(s)."; now matches Swift's
single-line camelCase output:
"✅ cursor: enabled=true startHandle=0.3 endHandle=0.3 arcSize=0.25 arcFlow=0 spring=0.72 glideDurationMs=0 dwellAfterClickMs=80 idleHideMs=20000"current_motion() helper added to overlay.rs — mirrors macOS
current_motion() so the tool can snapshot live motion values.cursors array — Rust supports multiple cursor
instances via CursorRegistry; Swift has a single AgentCursor.shared.
Included in structuredContent as an extra field; text format keeps
Swift's single-cursor vocabulary.get_agent_cursor_state_parity.exe:
"✅ cursor: enabled=true startHandle=0.3 endHandle=0.3 arcSize=0.25 arcFlow=0 spring=0.72 glideDurationMs=0 dwellAfterClickMs=80 idleHideMs=20000" ✓cursors array ✓set_agent_cursor_enabled + set_agent_cursor_motionlibs/cua-driver/Sources/CuaDriverServer/Tools/SetAgentCursorEnabledTool.swift:8-85
libs/cua-driver/Sources/CuaDriverServer/Tools/SetAgentCursorMotionTool.swift:11-187crates/platform-windows/src/tools/impl_.rs; macOS/linux OPENcrates/platform-windows/examples/agent_cursor_setters_parity.rs"Agent cursor 'default' enabled.";
now matches Swift verbatim "✅ Agent cursor enabled." (or "disabled")."Missing required parameter: enabled";
now Swift's "Missing required boolean field \enabled`."`.MotionConfig::with_overrides() and sends OverlayCommand::SetMotion
to the live render state — matches Swift's
AgentCursor.shared.defaultMotionOptions = opts."Cursor 'default' config updated.";
now matches Swift's "✅ cursor motion: startHandle=X endHandle=Y arcSize=Z arcFlow=W spring=S glideDurationMs=N dwellAfterClickMs=N idleHideMs=N".{"glide_duration_ms": 500}) now
coerced to f64 instead of silently ignored (mirrors Swift's number()).set_agent_cursor_style tool, matching Swift's
SetAgentCursorStyleTool surface).cursor_id parameter for both tools — selects an instance from
the Rust-only multi-cursor registry. Swift has a single
AgentCursor.shared.agent_cursor_setters_parity.exe:
enabled: true → "✅ Agent cursor enabled." ✓enabled: false → "✅ Agent cursor disabled." ✓"✅ cursor motion: startHandle=0.4 endHandle=0.3 arcSize=0.3 arcFlow=0 spring=0.8 glideDurationMs=500 dwellAfterClickMs=80 idleHideMs=20000" ✓get_recording_state + set_recordinglibs/cua-driver/Sources/CuaDriverServer/Tools/GetRecordingStateTool.swift:11-72
libs/cua-driver/Sources/CuaDriverServer/Tools/SetRecordingTool.swift:12-160crates/mcp-server/src/recording_tools.rs (cross-platform)crates/platform-windows/examples/recording_parity.rs"recording: enabled output_dir=X next_turn=N"
(double-space-separated); now Swift's "✅ recording: enabled output_dir=X next_turn=N"
single-space. Disabled case: "✅ recording: disabled"."Recording enabled → X" (Unicode arrow);
now Swift's "✅ Recording enabled -> X" (ASCII arrow). Disabled:
"✅ Recording disabled."."Missing required boolean field \enabled`."and"output_dir is required when enabling recording."`.idempotent: true — was false; Swift uses true.recording_parity.exe:
"✅ recording: disabled" ✓"✅ Recording enabled -> <path>" ✓"✅ recording: enabled output_dir=<path> next_turn=1" ✓"✅ Recording disabled." ✓list-toolslibs/cua-driver/Sources/CuaDriverCLI/CallCommand.swift:298-322libs/cua-driver/rust/crates/cua-driver/src/cli.rs::run_list_toolscrates/platform-windows/examples/list_tools_parity.rsSort order — Swift sorts tools alphabetically by name
(tools.sorted(by: { $0.name < $1.name })). Rust was iterating in
registration order (HashMap-backed Vec). Now sorts alphabetically
before printing, matching Swift's output byte-for-byte modulo per-tool
description content.
list_tools_parity.exe:
name: <first sentence> or just name ✓describelibs/cua-driver/Sources/CuaDriverCLI/CallCommand.swift:324-366
printUnknownTool:552libs/cua-driver/rust/crates/cua-driver/src/cli.rs::run_describecrates/platform-windows/examples/describe_parity.rsUnknown-tool listing order — Swift sorts available tools
alphabetically in the error stderr block. Rust was using
tool_names() which is registration order. Now sorts to match.
Exit code 64 (EX_USAGE) and known-tool output shape (name: X
description: + input_schema: pretty JSON) already matched
Swift.describe_parity.exe:
describe click: stdout starts with "name: click\n", contains
"description:" + pretty-printed "input_schema:" ✓describe this_tool_does_not_exist: exit code 64, stderr lists
31 tools alphabetically with "Unknown tool:" + "Available tools:"
preamble ✓get_window_statelibs/cua-driver/Sources/CuaDriverServer/Tools/GetWindowStateTool.swift:5-endcrates/platform-windows/src/tools/impl_.rs (GetWindowStateTool); macOS/linux OPENcrates/platform-windows/examples/get_window_state_parity.rs"Missing required integer field pid.""Missing required integer field window_id. Use list_windowsto enumerate the target app's windows, or readlaunch_app's windows array.""No window with window_id N exists. Call list_windows({pid: P}) for candidates.""window_id N belongs to pid Q, not pid P. Call list_windows({pid: P}) to get this pid's own windows."idempotent: false — was true; Swift uses false (each call
is a fresh snapshot).capture_mode semantics, query filter behavior, and Windows-specific
notes (UIA instead of AX, no Spaces, no javascript / screenshot_out_file
fields).javascript — macOS-only AppleScript hook for Chromium/Safari.screenshot_out_file — could be added later; not currently
implemented on Windows.get_window_state_parity.exe:
"window_id N belongs to pid Q, not pid P..." ✓"No window with window_id N exists..." ✓draglibs/cua-driver/Sources/CuaDriverServer/Tools/DragTool.swift:21-327crates/platform-windows/src/tools/impl_.rs (DragTool); macOS/linux OPENcrates/platform-windows/examples/drag_parity.rs"✅ Posted drag (BUTTON) to pid P from (x,y) → (x,y) in Nms / Ssteps.";
now Swift's "✅ Posted drag[ (BUTTON button)] to pid P from window-pixel (a, b) → (c, d), screen (e, f) → (g, h) in Nms / Ssteps."
(window-pixel + screen-coord pair; button suffix only for non-left)."Missing: from_y" (one field at a
time); now Swift's "from_x, from_y, to_x, and to_y are all required (window-local pixels)." even if only one is missing."Missing required integer field pid."
matches Swift.drag_parity.exe:
"✅ Posted drag to pid 62156 from window-pixel (100, 100) → (102, 102), screen (101, 168) → (103, 170) in 50ms / 2 steps." ✓replay_trajectorylibs/cua-driver/Sources/CuaDriverServer/Tools/ReplayTrajectoryTool.swift:18-endcrates/mcp-server/src/recording_tools.rs (cross-platform)crates/platform-windows/examples/replay_trajectory_parity.rs"Missing required parameter: dir"; now Swift's
"Missing required string field \dir`.". Empty-string dir` now also
rejected (was silently passing the empty path through).open_world: false — was true; Swift uses false (replays only
recorded actions, no fresh world interactions).replay_trajectory_parity.exe:
"Missing required string field \dir`."` ✓"Trajectory directory does not exist: <path>" ✓mcp (TCC auto-relaunch / daemon proxy)libs/cua-driver/Sources/CuaDriverCLI/CuaDriverCommand.swift —
MCPCommand, shouldUseDaemonProxy, runViaDaemonProxy,
launchDaemonViaOpen, waitForDaemon.libs/cua-driver/Sources/CuaDriverCLI/BundleHelpers.swift —
isExecutableInsideCuaDriverApp().libs/cua-driver/Sources/CuaDriverServer/CuaDriverMCPServer.swift —
makeProxy (the actor that re-implements ListTools /
CallTool over the daemon UDS).libs/cua-driver/rust/crates/cua-driver/src/bundle.rs —
is_executable_inside_cuadriver_app,
parent_is_not_launchd, is_env_truthy.libs/cua-driver/rust/crates/cua-driver/src/cli.rs —
should_use_daemon_proxy, launch_daemon_and_wait,
run_mcp_via_daemon_proxy.libs/cua-driver/rust/crates/cua-driver/src/proxy.rs —
run_proxy (the stdio loop forwarding tools/list and
tools/call through the daemon socket).libs/cua-driver/rust/scripts/CuaDriverBundle/Contents/Info.plist —
skeleton (Info.plist + empty MacOS/) that CD assembles into the
release-tarball CuaDriver.app that the auto-relaunch path lands
in. Stored under a non-.app directory so LaunchServices on
developer machines doesn't surface a ghost entry alongside the
real install.libs/cua-driver/rust/scripts/install.sh — drops the bundle to
/Applications/CuaDriver.app and symlinks the bin into it.When cua-driver-rs mcp is invoked from an IDE terminal (Claude
Code, Cursor, VS Code, Warp), macOS attributes the spawned process
to the parent terminal's TCC responsibility chain — not to
com.trycua.driver. AX probes against the process silently
fail because the user granted Accessibility to the bundle, not to
the IDE terminal. The Swift driver hit the same pathology and fixed
it in PR #1479; the Rust port hit it on the macOS GA flip path and
fixed it here. See issue #1525 for the full background.
Swift CuaDriver.app → com.trycua.driver.
Rust CuaDriver.app → com.trycua.driver.
The two bundles coexist on disk and in TCC; a user can grant
Accessibility + Screen Recording to each independently. The Rust
port has its own bundle name + identifier so:
open -n -g -a CuaDriverRs --args serve never accidentally
relaunches into the Swift bundle (and vice versa).--no-daemon-relaunch flag — same flag Swift exposes.CUA_DRIVER_RS_MCP_NO_RELAUNCH=1 env var — Rust-specific name
(Swift uses CUA_DRIVER_MCP_NO_RELAUNCH).--socket <path> flag — override the daemon UDS path used by the
proxy.CUA_DRIVER_RS_MCP_FORCE_PROXY=1 env var (Rust-only) — force
proxy mode without the bundle-context check. Useful when wrapping
the binary in a custom .app, or for manual smoke-testing of the
proxy path against a daemon you've already started by hand. Skips
the open -a step entirely; caller must supply a daemon on
--socket.The daemon's list method now returns full ToolDef
(input_schema + annotation hints), not just {name, description}.
The proxy uses this to build a complete tools/list from one
round-trip instead of N+1 list+describe calls. Backwards compatible:
older clients that only read name/description still work.
cua-driver serve --socket /tmp/test.sock &
CUA_DRIVER_RS_MCP_FORCE_PROXY=1 cua-driver mcp --socket /tmp/test.sock
From an MCP client, run the standard initialize → tools/list → tools/call get_screen_size handshake. Expect identical envelope shape to the in-process path. Concretely:
tools/list response (the daemon caches and returns it once at
proxy startup — same shape as the in-process server's tools/list):
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"tools": [
{ "name": "check_permissions", "description": "…", "inputSchema": {…}, "annotations": {…} },
{ "name": "click", "description": "…", "inputSchema": {…}, "annotations": {…} },
{ "name": "double_click", "…": "…" },
{ "name": "drag", "…": "…" },
{ "name": "get_accessibility_tree", "…": "…" },
{ "name": "get_config", "…": "…" },
{ "name": "get_cursor_position", "…": "…" },
{ "name": "get_recording_state", "…": "…" },
{ "name": "get_screen_size", "…": "…" },
{ "name": "get_window_state", "…": "…" },
{ "name": "hotkey", "…": "…" },
{ "name": "launch_app", "…": "…" },
{ "name": "list_apps", "…": "…" },
{ "name": "list_windows", "…": "…" },
{ "name": "page", "…": "…" },
{ "name": "press_key", "…": "…" },
{ "name": "replay_trajectory", "…": "…" },
{ "name": "right_click", "…": "…" },
{ "name": "screenshot", "…": "…" },
{ "name": "scroll", "…": "…" },
{ "name": "set_config", "…": "…" },
{ "name": "set_recording", "…": "…" },
{ "name": "set_value", "…": "…" },
{ "name": "type_text", "…": "…" },
{ "name": "zoom", "…": "…" }
// …plus the agent_cursor.* family when overlay is enabled.
// For an exact snapshot run: `cua-driver list-tools`
]
}
}
tools/call get_screen_size request + response:
// → stdin
{"jsonrpc":"2.0","id":2,"method":"tools/call",
"params":{"name":"get_screen_size","arguments":{}}}
// ← stdout
{"jsonrpc":"2.0","id":2,"result":{
"content":[{"type":"text","text":"{\"width\":1920,\"height\":1080}"}],
"structuredContent":{"width":1920,"height":1080},
"isError":false
}}
The result envelope is identical to the in-process path —
structuredContent + text mirror, no proxy-specific wrapping.
Without spawning the daemon first, repeat step 2. Expect
non-zero exit and a "daemon not reachable" diagnostic on stderr
(the fail-fast contract that matches Swift makeProxy). Exact
stderr text emitted by main.rs's proxy-error branch (wrapping
proxy::run_proxy's pre-check):
cua-driver-rs: cua-driver-rs daemon not reachable on /tmp/test.sock. Start it with `open -n -g -a CuaDriverRs --args serve` and retry.
Process exits with status 1 before reading any MCP request on
stdin. With CUA_DRIVER_RS_MCP_FORCE_PROXY=1 set, cli.rs's
run_mcp_via_daemon_proxy emits the more specific:
cua-driver-rs: CUA_DRIVER_RS_MCP_FORCE_PROXY=1 but no daemon listening on /tmp/test.sock. Start one with `cua-driver serve --socket /tmp/test.sock` and retry.
status + stoplibs/cua-driver/Sources/CuaDriverCLI/ServeCommand.swift:368-470libs/cua-driver/rust/crates/cua-driver/src/serve.rs::run_status_cmd, run_stop_cmdcrates/platform-windows/examples/daemon_lifecycle_parity.rsstop silent on success — Rust was printing a daemon-stopped line
on stdout after a successful stop. Swift's stop exits silently with
status 0. Now matches Swift byte-for-byte.
status output: "Cua Driver daemon is running\n socket: <path>\n pid: <N>\n" ✓status exit code: 0 when running, 1 when not ✓stop exit code: 0 when ran, 1 when no daemon ✓"Cua Driver daemon is not running" ✓daemon_lifecycle_parity.exe:
status running: 3-line output ✓stop running: silent stdout (was printing extra line) ✓status not-running: exit 1 + stderr message ✓stop not-running: exit 1 + stderr message ✓type_text_chars → type_textlibs/cua-driver/Sources/CuaDriverServer/ToolRegistry.swift:55-70libs/cua-driver/rust/crates/cua-driver/src/serve.rs (both pipe variants)
libs/cua-driver/rust/crates/mcp-server/src/tool.rs::ToolRegistry::invokecrates/platform-windows/examples/type_text_chars_alias_parity.rsSwift treats type_text_chars as a deprecated alias for type_text:
the aliased name is NOT in tools/list, but invoking it with the old name
works (resolves to type_text) AND emits a stderr deprecation warning.
Rust was previously registering type_text_chars as a fully-fledged
separate tool with its own description and a different text format.
Changes:
TypeTextCharsTool from the registration in
platform-windows/src/tools/impl_.rs::build_registry (the struct is
kept in the crate for now via a no-op binding to avoid the dead-code
warning during incremental cleanup).serve.rs: both "call" dispatch sites (Unix-socket + Windows-pipe
variants) now resolve type_text_chars → type_text before the
registry lookup, with the stderr deprecation message.mcp-server/src/tool.rs::ToolRegistry::invoke: also resolves the
alias as a defense-in-depth for direct in-process callers.type_text_chars_alias_parity.exe:
tools/list does NOT contain "type_text_chars" ✓type_text_chars with a 1-char text resolves to type_text's
response: "✅ Typed 1 char(s) on pid 62156 via PostMessage (30ms delay)." ✓set_agent_cursor_stylelibs/cua-driver/Sources/CuaDriverServer/Tools/SetAgentCursorStyleTool.swift:10-111crates/platform-windows/src/tools/impl_.rs (SetAgentCursorStyleTool); macOS/linux OPENcrates/platform-windows/examples/set_agent_cursor_style_parity.rs"cursor style: gradient_colors=[X, Y] bloom_color=Z image_path=(unchanged)"
(always lists all three fields with (unchanged)/(reverted) placeholders,
space after comma). Now Swift's exact wording: "✅ cursor style: gradient_colors=[X,Y] bloom_color=Z"
(omit fields not provided; no space after comma; image_path only when set)."cursor style: gradient_colors=(unchanged) bloom_color=(unchanged) image_path=(unchanged)";
now "✅ cursor style: reverted to default" matching Swift's parts.isEmpty branch.set_agent_cursor_style_parity.exe:
"✅ cursor style: gradient_colors=[#FF6B6B,#FF8E53] bloom_color=#A855F7" ✓"✅ cursor style: reverted to default" ✓zoomlibs/cua-driver/Sources/CuaDriverServer/Tools/ZoomTool.swift:12-endcrates/platform-windows/src/tools/impl_.rs (ZoomTool); macOS/linux OPENcrates/platform-windows/examples/zoom_parity.rs"Zoom (X,Y)–(X,Y) → WxH px JPEG."; now Swift's
verbatim multi-line message starting "✅ Zoomed region captured at native resolution." with the from_zoom=true integration hint."Missing required integer field pid." /
"Missing required integer field window_id." /
"Missing required region coordinates (x1, y1, x2, y2)." /
"Invalid region: x2 must be > x1 and y2 must be > y1." /
"Zoom region too wide: N px > 500 px max. ..." — all match Swift.pid now also required (Swift requires it).idempotent: false — was true; Swift uses false.window_id required — Swift uses pid+frontmost-window; Windows
uses HWND directly since there's no clean Win32 analogue without an
explicit HWND.zoom_parity.exe: invalid-region, too-wide, and real-zoom paths all OK.
mcp-configlibs/cua-driver/Sources/CuaDriverCLI/CuaDriverCommand.swift:37-150libs/cua-driver/rust/crates/cua-driver/src/cli.rs::run_mcp_configcrates/platform-windows/examples/mcp_config_parity.rs--client): generic mcpServers JSON snippet ✓--client claude: claude mcp add --transport stdio cua-driver -- <bin> mcp ✓--client codex: codex mcp add cua-driver -- <bin> mcp ✓--client cursor: JSON with type: stdio ✓--client openclaw: openclaw mcp set ... ✓--client opencode: opencode.json snippet with type=local ✓--client hermes: YAML snippet ✓--client pi: shell-tool fallback message ✓mcp_config_parity.exe runs all 8 client variants + the unknown-client
error path against the in-process binary, asserting each output contains
the right needles. All pass on first try.
updatelibs/cua-driver/Sources/CuaDriverCLI/CuaDriverCommand.swift:638-686libs/cua-driver/rust/crates/cua-driver/src/cli.rs::run_update_cmdcrates/platform-windows/examples/update_parity.rsRust was looking for release tag prefix cua-driver-v (Swift's prefix)
when fetching the latest version from trycua/cua. That would match the
Swift cua-driver-v0.1.9 release and report it as an available upgrade
for the Rust port — confusing users into installing the WRONG binary.
Now uses prefix cua-driver-rs-v (Rust port's actual tag prefix).
update_parity.exe:
Current version: and Checking for updates… ✓Already up to date., New version available: <v>,
or Could not reach GitHub ✓cua-driver-v0.1.9 tag is no longer mis-
matched as a Rust-port release.doctorlibs/cua-driver/Sources/CuaDriverCLI/DoctorCommand.swift (legacy-cleanup only — single
Nothing to clean — install is up to date. codepath, no diagnostics)libs/cua-driver/rust/crates/cua-driver/src/doctor.rs +
libs/cua-driver/rust/crates/cua-driver/src/cli.rs::run_doctor_cmdcrates/cua-driver/src/doctor.rs #[cfg(test)] mod tests
(5 unit tests: text rendering, JSON rendering, status-tag mapping,
cross-platform probe smoke path)Probe-runner that emits a structured report. Each probe is one line
tagged [ok], [warn], or [err] so the output is grep-friendly.
--json switches to a machine-readable shape for scripting (also
suppresses the update-available banner so JSON output stays parseable).
Cross-platform probes:
binary — version + <arch>-<os> target tripleinstall dir — current_exe() resolved through symlinkshome dir — ~/.cua-driver-rs existence + cached release-dir counttelemetry — env-var opt-out state + install-id file presence
(presence only — UUID value never read)Windows probes:
interactive session — ProcessIdToSessionId(GetCurrentProcessId()).
Session 0 → [warn] with explicit guidance ("re-run from an
interactive logon — RDP, console, or a scheduled task in the user's
session"). Sessions ≥1 → OpenWindowStationW(WinSta0) +
GetForegroundWindow() confirmation probe → [ok] when both succeed.UI Automation — CoCreateInstance(CUIAutomation) succeeds → [ok],
else [err].EnumWindows visible — top-level visible-window count. When zero
and Session 0 was warned above, the probe appends a "consistent
with Session 0" detail so the two findings read as one.Linux probes:
display server — DISPLAY / WAYLAND_DISPLAY matrix (X11 only,
Wayland only with XWayland hint, both set, neither set).X11 connection — quick handshake via platform_linux::x11::list_windows(None).AT-SPI — AT_SPI_BUS env var, fallback to gdbus introspect --session --dest org.a11y.Bus.macOS probes:
legacy LaunchAgent — opportunistic removal of
~/Library/LaunchAgents/com.trycua.cua_driver_updater.plist (preserves
the old DoctorCommand cleanup behavior, now as a structured probe).legacy update script — opportunistic removal of
/usr/local/bin/cua-driver-update (root-owned path gets a [warn]
with the sudo rm command).TCC + cdhash report — pointer to cua-driver diagnose for the
full bundle / signature / TCC dump.0 when every probe is [ok] or [warn]. Non-zero only when at
least one [err] probe failed (e.g. current_exe() returned an error,
or CoCreateInstance(CUIAutomation) failed on Windows). Warnings
deliberately do not fail the run because misconfigured environments
are sometimes the expected state — CI invocations of doctor to
render the report still expect exit 0.
The Swift port's DoctorCommand only handles legacy install-bit
cleanup on macOS. The Rust port runs on Windows + Linux as well, where
the analogous misconfigurations (Session 0 on Windows, missing
DISPLAY on Linux, no AT-SPI on Linux) are the source of "the tools
are broken" reports. Folding diagnostics into doctor mirrors what
users instinctively try first when something silently returns empty
arrays.
dump-docslibs/cua-driver/Sources/CuaDriverCLI/DumpDocsCommand.swiftlibs/cua-driver/rust/crates/cua-driver/src/cli.rs::run_dump_docs_with_typecrates/platform-windows/examples/dump_docs_parity.rs{mcp_tools: [...]}; now matches Swift's
CombinedDocs shape: --type=all → {cli: {...}, mcp: {...}},
--type=mcp → {version, tools: [...]}, --type=cli → CLI section.--type flag added (cli | mcp | all, default all) matching
Swift's flag.version field added to the MCP section matching Swift's
MCPDocumentation.cli.rs so there's
no equivalent introspection. Stub returns {version, commands: [], _note: "..."} directing users to --help for CLI docs. Full CLI
introspection would require either clap migration or a parallel
hand-maintained doc table.read_only, destructive, idempotent per
tool. Swift only emits name, description, input_schema. Rust keeps
the extras as a documented enrichment.dump_docs_parity.exe:
--type=all): {cli, mcp} with 30 tools ✓--type=mcp: top-level {version, tools} ✓--type=cli: stub section ✓--pretty: multi-line JSON (991 lines) ✓Not an MCP tool — an installer-text contract. The hint text printed at the end of every cua-driver-rs install (Try-it / agent skill pack / MCP setup per client / docs link) is sourced from a single shared file:
libs/cua-driver/scripts/post-install-hints.txt
with {{BINARY}} placeholder.{{BINARY}} for the installed binary path, prints it, then appends
an OS-specific autostart hint inline:
libs/cua-driver/scripts/_install-rust.sh — curl from
raw.githubusercontent.com (remote install path) + bash sed.libs/cua-driver/scripts/install.ps1 — Invoke-WebRequest from
raw.githubusercontent.com + PowerShell -replace.libs/cua-driver/scripts/install-local.sh — direct disk read
from ../cua-driver/scripts/post-install-hints.txt + sed.libs/cua-driver/rust/scripts/install-local.ps1 — direct disk read
from ..\cua-driver\scripts\post-install-hints.txt + -replace.If the .txt is unreachable (network failure on remote installs, repo layout change on local), each installer falls back to a one-line essentials string so the user always gets enough to recover.
Why not a CLI subcommand: an earlier draft of this work added
cua-driver post-install to the Rust binary and had all 4 installers
delegate via & $installedBinary post-install. Reverted — the
chicken-and-egg risk (failed binary install = no hints either) made
the .txt approach the safer choice. The .txt has no runtime
dependency; even a totally broken binary install still prints hints.
Why OS-specific hints stay inline: each script targets one OS (install.ps1 = Windows; install-local.sh on macOS vs Linux is the only branching case). The OS-specific block is 4-6 lines, naturally fits in the script that targets that OS, and is the only part that would need conditional rendering in a single-file design.
Status: VERIFIED on macOS via bash libs/cua-driver/scripts/install-local.sh
end-to-end. Windows VM verification pending.
Cross-cutting infrastructure (not an MCP tool) used by launch_app and
by the 7 action tools (click, type_text, hotkey, press_key,
drag, scroll, set_value) via the
Per-action focus suppression wrap.
Catches apps that self-activate during launch (Chrome, Electron,
Safari) or as a side-effect of an action (Safari opening a new tab in
response to AXPress, autocomplete pulling itself forward), and
re-activates the prior frontmost app before the user perceives the
steal.
libs/cua-driver/Sources/CuaDriverCore/Focus/SystemFocusStealPreventer.swiftcrates/platform-macos/src/focus_steal.rscrates/platform-macos/src/focus_steal.rs #[cfg(test)] mod tests
— dispatcher add/remove/match, deadline reap, janitor start/stop
lifecycle.tests/integration/test_focus_steal_parity.py — runs
against both Swift and Rust binaries with expectedFailure on the
Swift Safari-URL case for the known Cryptex+oapp LaunchServices bug.OnceLock<Arc<FocusStealPreventer>>) — mirrors
Swift's AppStateRegistry.systemFocusStealPreventer.NSWorkspaceDidActivateApplicationNotification observer, registered
on a fresh background NSOperationQueue (not mainQueue). This is
load-bearing: the binary's Call and --no-overlay Serve modes don't
run an NSApplication main-thread run loop, so an observer on mainQueue
would silently no-op. Background queue means the block fires regardless
of run-loop state.Mutex<HashMap<Uuid, SuppressionEntry>> where each entry is
(target_pid: Option<i32>, restore_to: i32, deadline: Instant, origin).
target_pid = None is the wildcard (catches any activation other than
restore_to).with_suppression(target_pid, restore_to, origin, async fn).begin_suppression(...) -> SuppressionLease, Drop calls
end_suppression synchronously.Instant::now() + 5s. Pruned on
every observer fire and by a 1s tokio interval task gated by a
watch::Sender (start on first-add, stop on last-remove). Mirrors
PR #1521's deadline + janitor layers; RAII (SuppressionLease)
subsumes Swift's withSuppression (layer 1) + SuppressionLease
(layer 2) into one Rust idiom.NSRunningApplication::runningApplicationWithProcessIdentifier(restore_to)
and calls activate(options:[]). -[NSRunningApplication activate:]
is documented thread-safe — no main-thread hop needed.FocusGuard.withFocusSuppressed ships layer 3 only — see
Per-action focus suppression. The
reactive suppressor is wired up across the 7 action tools; the
enablement (AXManualAccessibility/AXEnhancedUserInterface) and
synthetic-focus (AXFocused/AXMain write+restore) layers are
deferred because the AX assertion + attribute-write plumbing isn't
yet ported. Empirically the layer-3 guard combined with
WindowChangeDetector's wildcard catches the majority of
side-effects on real-world workflows.WindowChangeDetector ported and wired — see
Per-action focus suppression.tokio::sync::watch — Swift uses Task cancellation;
Rust's tokio idiom is the watch-channel select pattern. Behavior is
identical: idle dispatcher → janitor sleeps; new entry → janitor
wakes; map drains → janitor exits and waits for the next add.Per-action wrap around the 7 macOS action tools (click, type_text,
hotkey, press_key, drag, scroll, set_value) that catches
side-effect side-effects of dispatching an action on a backgrounded
app — Safari opening a new tab in response to AXPress, a "Sign In"
button opening a sheet, an autocomplete popover floating into view,
etc. Mirrors Swift's WindowChangeDetector + FocusGuard cross-cutting
helpers wired into ClickTool / TypeTextTool / SetValueTool.
libs/cua-driver/Sources/CuaDriverServer/Tools/WindowChangeDetector.swiftlibs/cua-driver/Sources/CuaDriverCore/Focus/FocusGuard.swiftcrates/platform-macos/src/window_change_detector.rscrates/platform-macos/src/focus_guard.rswindow_change_detector::tests (8 cases — diff +
result_suffix branches) and focus_guard::tests (4 cases —
arm/skip lifecycle).tests/integration/test_focus_steal_parity.py
tests/integration/test_api_parity.py — confirmed no regression
after the action-tool wiring (identical pass/fail to main).Each wrapped action follows the same shape:
let prior = apps::frontmost_pid();
let snapshot = WindowChangeDetector::snapshot(); // arms wildcard
let result = focus_guard::with_focus_suppressed(
Some(target_pid), prior, "<tool>.<origin>",
|| async { do_action(...).await }
).await;
let changes = snapshot.detect_async().await; // drops wildcard
// append changes.result_suffix() to success text
Two suppression entries are armed in series:
snapshot(), dropped in detect()) —
target_pid = None, restore_to = current frontmost. Catches any
activation other than the prior frontmost during the full
snapshot → action → detect window. Mirrors Swift's
WindowChangeDetector.snapshot() wildcard.FocusGuard::with_focus_suppressed around
the action call) — target_pid = Some(action_pid),
restore_to = prior frontmost. Catches a target-self-activation
triggered by the AX call itself. Skipped when target ==
frontmost (no point fighting ourselves). 50ms post-action settle
before the lease drops, giving any in-flight reflex activation
time to be observed.The Changes.result_suffix() wording matches Swift's
WindowChangeDetector.Changes.resultSuffix:
"\n\n🪟 Action opened new window(s): App (\"Title\")."
(multiple windows grouped by app, titles in quotes, joined with a
, (comma followed by a space); multiple apps joined with a ;
(semicolon followed by a space); alphabetical by app name)."\n\n🔀 Action caused a different app to become frontmost.".MCP callers can string-match on either lead emoji to detect a side-effect without per-binary special-casing.
AXManualAccessibility / AXEnhancedUserInterface
assertion and AXFocused/AXMain write+restore aren't ported yet.
The Rust port ships only layer 3 (reactive suppressor). Empirically
the wildcard + targeted reactive pair catches the majority of
side-effects; layers 1+2 are a follow-up when the AX assertion
plumbing lands.Snapshot::diff is
pub(crate) so unit tests can pin down opened/closed semantics
without driving the live CGWindowList enumerator. Swift tests
the same way via currentWindowIds() indirection.detect_async() runs on spawn_blocking — Swift's poll loop
uses Task.sleep; Rust's blocking std::thread::sleep is cheap to
offload via tokio::task::spawn_blocking and keeps the runtime
responsive to other in-flight work.libs/cua-driver/Sources/CuaDriverCore/Telemetry/TelemetryClient.swiftcrates/cua-driver/src/telemetry.rscrates/cua-driver/src/telemetry.rs #[cfg(test)] mod tests
(8 unit tests: env parsing, opt-out default, CI detection,
payload shape, payload-key collision, install-id idempotent
persistence, ISO-8601 format, arch mapping)https://eu.i.posthog.com/capture/phc_eSkLnbLxsnYFaXksif1ksbrNzYlJShr35miFLDppF14 (public —
ingest-only, can't read events)cua_driver_install, cua_driver_mcp, cua_driver_serve,
cua_driver_stop, cua_driver_status, cua_driver_list_tools,
cua_driver_describe, cua_driver_recording, cua_driver_config,
cua_driver_mcp_config, cua_driver_dump_docs, cua_driver_update,
cua_driver_doctor, cua_driver_diagnose,
cua_driver_api_<tool> (per-tool call invocations).Keeping the endpoint + names identical means Rust + Swift events land
in the same PostHog project; $lib = "cua-driver-rs" vs
"cua-driver-swift" is the only signal to split them.
Each event sends:
| Key | Value | Source |
|---|---|---|
cua_driver_version | CARGO_PKG_VERSION (e.g. "0.1.3") | build-time |
os | "macos" / "linux" / "windows" | std::env::consts::OS |
os_version | OS-reported version string | sw_vers -productVersion / /etc/os-release / cmd /c ver |
arch | "arm64" / "x86_64" (aarch64 → arm64) | std::env::consts::ARCH |
is_ci | bool | env-var probe (see below) |
$lib | "cua-driver-rs" | hard-coded |
$lib_version | CARGO_PKG_VERSION | build-time |
CI-environment detection probes the same vars Swift does: CI,
CONTINUOUS_INTEGRATION, GITHUB_ACTIONS, GITLAB_CI, JENKINS_URL,
CIRCLECI.
Verified by unit test build_payload_contains_required_keys which
asserts none of $user, username, home_dir, cwd, argv ever
appear in a serialized payload:
$USER / $HOMEcall <tool> reports only the tool name as
cua_driver_api_<tool>Set CUA_DRIVER_RS_TELEMETRY_ENABLED=false (or 0, no, off) to
disable ALL telemetry from the binary. Unset defaults to enabled
(matches Swift's persisted-flag default of true).
The only path that ignores the opt-out is capture_install(),
which fires the one-shot cua_driver_install ping from install.sh's
post-install hook. Rationale: an opt-out user is still a counted
install in the adoption metric; every subsequent event from the binary
respects the flag normally. Guarded by ~/.cua-driver-rs/.installation_recorded
so re-running install.sh doesn't re-fire it.
The Rust port is deliberately partitioned from the Swift port at the filesystem + env-var layer:
| Layer | Swift | Rust |
|---|---|---|
| Install dir | ~/.cua-driver/ | ~/.cua-driver-rs/ |
| Install UUID | ~/.cua-driver/.telemetry_id | ~/.cua-driver-rs/.telemetry_id |
| Install marker | ~/.cua-driver/.installation_recorded | ~/.cua-driver-rs/.installation_recorded |
| Opt-out env var | CUA_DRIVER_TELEMETRY_ENABLED | CUA_DRIVER_RS_TELEMETRY_ENABLED |
This means:
ureq v3 with default features (rustls + gzip + json). Single POST,
3-second timeout, fire-and-forget. Sent from tokio::task::spawn_blocking
when a runtime is live (MCP server, serve daemon), else from a
short-lived OS thread (synchronous CLI subcommands like list-tools).
Network errors, timeouts, and 4xx/5xx responses are logged via
tracing::debug!(target: "cua_driver::telemetry", …) only — never
surfaced to stdout/stderr unless CUA_DRIVER_RS_TELEMETRY_DEBUG=true.
telemetryEnabled setting via ConfigStore.loadSync(). Rust honors
only the env var. The Rust port has no ConfigStore analogue yet, and
YAGNI suggests waiting until someone files a request.GUI_LAUNCH emission. Swift fires cua_driver_gui_launch when
the binary is launched bare (Finder / Dock double-click). Rust has no
GUI surface yet, so the constant is reserved but unused.is_ci uses env-var probing only. Same probe list as Swift; no
extra Rust-specific signals.mcp / serve / doctor)crates/cua-driver/src/version_check.rscrates/cua-driver/src/version_check.rs #[cfg(test)] mod tests
(22 unit tests: semver edge cases, cache round-trip in a
tempdir, dismissal persistence, 20-hour refresh threshold,
env-var + config opt-out, JSON release-list filtering,
banner format, ISO-8601 timestamp)On the long-running interactive entry points (mcp, serve,
doctor) the binary kicks off a background HTTP check against
https://api.github.com/repos/trycua/cua/releases?per_page=40,
filters to the cua-driver-rs-v* tag prefix, and prints a two-line
banner to stderr if the highest non-draft non-prerelease release
is strictly newer than CARGO_PKG_VERSION and the user hasn't
previously dismissed it:
✨ cua-driver v0.1.4 is available (you have v0.1.3).
Update with: cua-driver update
Release notes: https://github.com/trycua/cua/releases/tag/cua-driver-rs-v0.1.4
The check runs on tokio::task::spawn_blocking when a runtime is
live, else a short-lived OS thread, so the daemon's start-up path
is never delayed by network latency.
The latest-version answer is cached on disk at
~/.cua-driver-rs/version_check.json:
{
"last_checked_unix": 1700000000,
"last_checked_at": "2023-11-14T22:13:20Z",
"latest_version": "0.1.4",
"dismissed_versions": []
}
Refreshed only when the cached last_checked_unix is more than 20
hours old, bounding outbound requests to roughly one per machine per
day even on a hot reload loop. Failed HTTP fetches fall back to the
cached value (better an old banner than none on a brief network blip).
Only mcp, serve, and doctor call maybe_announce_update().
The following entry points are NOT instrumented because they're
routinely piped from scripts and a banner would corrupt their
parseable output:
--version / -Vlist-toolsdescribe <tool>call <tool>dump-docsmcp-configupdate, stop, status, recording, config, diagnose,
telemetry install-eventCUA_DRIVER_RS_UPDATE_CHECK=false (also 0, no,
off; case-insensitive) — single-invocation off.update_check_enabled = false in
~/.cua-driver/config.json — persistent off. Set via
cua-driver config set update_check_enabled false.CARGO_PKG_VERSION that
carries a semver pre-release suffix (-dev, -rc.1, -beta)
short-circuits the check entirely. There is no matching published
release for a source / development build to recommend.ureq v3 with Accept: application/vnd.github+json and a
cua-driver-rs/<version> user-agent. 4-second timeout. Single GET,
fire-and-forget — the response body is read into serde_json::Value
on the background task, never touching the foreground startup path.
Network errors, timeouts, 4xx/5xx responses, and JSON-parse failures
are logged via tracing::debug!(target: "cua_driver::version_check", …) only — never surfaced to stderr.
cua-driver updaterun_update_cmd calls into the same
version_check::fetch_latest_version() and version_check::is_newer()
helpers so the proactive banner and the manual subcommand agree on
tag-filtering rules and semver compare semantics. This also removes
the prior shell-out to curl from cli::run_update_cmd.
version_check::dismiss_version(&str) appends a version string to
the dismissed_versions list on disk. No call site in the current
binary (banner is informational only today); kept public so a future
interactive prompt path (TUI helper, GUI extra) can persist the
"skip until next version" choice without re-implementing the cache
layer.
This section covers the cross-platform Rust installer's runtime behaviors: how it picks a version, where it lands files, and how it keeps the on-disk state bounded across repeated upgrades.
env > baked > API)Both Rust installers (scripts/install.sh, scripts/install.ps1)
resolve the release tag in the same priority order as the Swift
cua-driver installer:
CUA_DRIVER_RS_VERSION ($env:CUA_DRIVER_RS_VERSION on Windows).Bake version into install scripts step
after each cua-driver-rs-v* tag push. Matches the Swift driver's
CUA_DRIVER_BAKED_VERSION shape.The Swift installer adopted this pattern after we hit two failure modes the API-only chain couldn't dodge:
curl … | bash install into a hard failure.cua-driver-v* (Swift) and cua-driver-rs-v* (Rust) tags, plus
unrelated tags from other libs. As release cadence grows, the
first page of /releases?per_page=N is no longer guaranteed to
include any matching tag, and a single-page fetch will silently
resolve nothing.Baking the version turns both of those into non-issues for the
common-case install (curl-against-main / irm-against-main). The API
path is still exercised by dev installs from un-baked branches, so
the pagination fix in install.ps1 (commit 3425af0b) stays
valuable; install.sh keeps its single-page fetch for now, which
is a follow-up candidate once the baked default is shipped (it is
fallback-only and not hit by the default curl-from-main path).
# ~~~ BAKED_VERSION: auto-updated by CD workflow after each release — do not edit ~~~
CUA_DRIVER_RS_BAKED_VERSION="<version>"
# ~~~ END_BAKED_VERSION ~~~
The PowerShell variant swaps the bash assignment for
$Script:CuaDriverRsBakedVersion = "<version>" but reuses the same
marker comments. The CD step's sed patterns key on the assignment
line, not the markers, so the markers are a human cue only.
.github/workflows/cd-rust-cua-driver.yml runs a Bake version into install scripts step at the end of the release job after each
cua-driver-rs-v* tag push. It runs on ubuntu-latest (GNU sed)
and the equivalent Swift step runs on macos-15 (BSD sed), so the
two workflows use slightly different sed -i syntax:
| Workflow | Runner | sed -i form |
|---|---|---|
cd-rust-cua-driver.yml (this) | ubuntu-latest | sed -i 's/.../.../ (GNU) |
cd-swift-cua-driver.yml | macos-15 | sed -i '' 's/.../.../ (BSD) |
Both push the rewritten files back to main using a GitHub App
token (RELEASE_APP_ID + RELEASE_APP_PRIVATE_KEY) so the push
bypasses the "Changes must be made through a pull request" ruleset
on main — the default GITHUB_TOKEN (github-actions[bot]) is
rejected by that ruleset.
The commit author is trycua-release[bot] and the message is
chore(cua-driver-rs): bake version <V> into install scripts [skip ci]
(the [skip ci] suppresses the recursive CD trigger from the
bake-push hitting main).
CUA_DRIVER_RS_KEEP_VERSIONS)Each install drops the binary into a fresh
$HOME_DIR/packages/releases/<version>-<target>/ directory and
retargets current at it. Old per-version dirs are kept on disk so
rollback is ln -sfn / junction-retarget away with no re-download.
Disk usage grows ~15 MB per upgrade, so the installer runs a
post-install GC pass to trim oldest dirs back to a configurable cap.
CUA_DRIVER_RS_KEEP_VERSIONS=<N> (env). <N> is any
non-negative integer; 0 disables GC entirely (legacy behavior —
retains every version forever). Non-integer or negative values fall
back to the default with a warning: log.${TARGET} triple are eligible to prune. A multi-arch
dev with both aarch64-apple-darwin and x86_64-unknown-linux-gnu
under one $HOME_DIR (rare but possible — e.g. shared home over
NFS) has each target's history GC'd independently.current resolves to is older than the keep window (e.g. user
rolled back to an old version). Worst-case post-GC dir count is
keep + 1; common-case is exactly keep.prune_old_releases (sh) /
Invoke-OldReleasesGc (ps1) is invoked only after current has
been retargeted at the new install, so the about-to-be-active
version is never a deletion candidate./Applications/CuaDriver.app
install is an in-place replacement (no per-version directory
accumulation), so the GC pass is a no-op there by construction
(the Darwin branch never enters the versioned-dirs install path).install.sh — prune_old_releases uses ls -dt "$RELEASES_DIR"/*/
for mtime-sorted candidates, filters by *-$TARGET, skips the dir
that readlink "$CURRENT_LINK" resolves to, and xargs -0 rm -rfs
the excess past the keep window.install.ps1 — Invoke-OldReleasesGc uses
Get-ChildItem -Directory | Where-Object Name -like "*-$target" | Sort-Object LastWriteTime -Descending, resolves
Get-JunctionTarget $CurrentDir to find and exempt the active
install, and Remove-Item -Recurse -Forces the excess.Linux (pin three versions, observe GC):
# Install three pinned versions in sequence. Each one drops a new
# per-version dir under ~/.cua-driver-rs/packages/releases/ and
# retargets `current`.
CUA_DRIVER_RS_VERSION=0.1.4 bash install.sh
CUA_DRIVER_RS_VERSION=0.2.0 bash install.sh
CUA_DRIVER_RS_VERSION=0.2.1 bash install.sh
ls ~/.cua-driver-rs/packages/releases/
# → 0.1.4-…, 0.2.0-…, 0.2.1-… (3 dirs, GC saw ≤ default-5, no-op)
# Force keep=2 — GC trims to 2 newest, current always preserved.
CUA_DRIVER_RS_VERSION=0.2.1 CUA_DRIVER_RS_KEEP_VERSIONS=2 bash install.sh
ls ~/.cua-driver-rs/packages/releases/
# → 0.2.0-…, 0.2.1-… (0.1.4 pruned)
# keep=0 — GC disabled.
CUA_DRIVER_RS_VERSION=0.2.1 CUA_DRIVER_RS_KEEP_VERSIONS=0 bash install.sh
# (logs "version GC disabled", retains all on-disk dirs)
Windows (equivalent in PowerShell):
$env:CUA_DRIVER_RS_VERSION = "0.1.4"; irm <url>/install.ps1 | iex
$env:CUA_DRIVER_RS_VERSION = "0.2.0"; irm <url>/install.ps1 | iex
$env:CUA_DRIVER_RS_VERSION = "0.2.1"; irm <url>/install.ps1 | iex
Get-ChildItem ~\.cua-driver-rs\packages\releases\
# → 3 dirs
$env:CUA_DRIVER_RS_KEEP_VERSIONS = "2"
$env:CUA_DRIVER_RS_VERSION = "0.2.1"; irm <url>/install.ps1 | iex
Get-ChildItem ~\.cua-driver-rs\packages\releases\
# → 2 dirs (0.1.4 pruned)
Two installs running at the same time (user clicks the one-liner while
a CI script is also installing; two terminals; cron-driven reinstall
racing a manual one) can race on the atomic current swap and leave
the visible binary pointing at a partially-populated release dir.
Serialize installs per $HOME_DIR with a process-level mutex.
| Platform | Mutex |
|---|---|
| Linux (and Linux-via-WSL) | mkdir $HOME_DIR/packages/.install.lock.d — atomic on POSIX, no flock dependency. First install wins; concurrent attempts get EEXIST and poll. |
| Windows | System.IO.FileStream opened on $HomeDir\install.lock with FileShare::None. Windows kernel rejects a second open until the first handle closes, so the open call itself is the acquisition. |
Both primitives are unprivileged — no admin / sudo / Developer Mode.
another cua-driver-rs install is already in progress (lock at <path>); waiting... exactly once.LOCK_STALE_AFTER_SECONDS / $Script:LockStaleAfterSeconds), not
a magic number, so future tuning is grep-able.LOCK_STALE_AFTER_SECONDS of waiting we
log lock appears stale (>600s), forcing release and rm -rf /
Remove-Item the lock entry, then retry. The alternative (hang
forever) leaves users wedged with no obvious recovery path.After acquiring, the installer writes a tiny info blob into the lock so a user investigating a stuck install can see who holds it:
$ cat ~/.cua-driver-rs/packages/.install.lock.d/info
pid=43210
started=2026-05-17T09:14:22Z
argv=install.sh
PS> Get-Content $env:USERPROFILE\.cua-driver-rs\install.lock
pid=43210
started=2026-05-17T09:14:22.123Z
invocation=install.ps1 -Release latest
Both scripts release the lock unconditionally:
install.sh — trap cleanup_on_exit EXIT plus per-signal traps
for INT and TERM that release then re-raise so the exit code
reflects the signal.install.ps1 — top-level try { ... } finally { Release-InstallLock }
wrapping the whole Main block. PowerShell's finally fires on
normal exit, exceptions, exit, and Ctrl-C (pipeline-stop).A half-finished install with a held lock would wedge every subsequent install for the full 600s stale window, so the cleanup wiring is non-optional.
Linux — concurrent contention:
# Shell 1
bash install.sh # holds the lock, takes ~20s
# Shell 2 (kick off while shell 1 is still running)
bash install.sh
# → "another cua-driver-rs install is already in progress (lock at
# ~/.cua-driver-rs/packages/.install.lock.d); waiting..."
# (blocks until shell 1 finishes, then proceeds normally)
Linux — stale lock recovery:
# Simulate a dead holder.
mkdir ~/.cua-driver-rs/packages/.install.lock.d
echo "pid=99999" > ~/.cua-driver-rs/packages/.install.lock.d/info
bash install.sh
# → "another cua-driver-rs install is already in progress…; waiting..."
# (waits 600s, then:)
# → "lock appears stale (>600s), forcing release"
# (proceeds normally)
Windows — equivalent in two PowerShell windows. The stale-recovery
test in PowerShell is New-Item -Path $env:USERPROFILE\.cua-driver-rs\install.lock
plus Remove-Item to clear after; same 600s wait.