docs/src/app/changelog/page.mdx
doctor command - Added agent-browser doctor for one-shot diagnosis of an install. Checks environment, Chrome, running daemons, config files, security, providers, and network connectivity; auto-cleans stale daemon sidecar files on every run; and performs a live headless launch test. Supports --offline to skip network probes, --quick to skip the launch test, --fix for opt-in repairs (install missing Chrome, close version-mismatched daemons, prune expired state files), and --json for structured output (#1254)t1, t2, t3 that don't shift when other tabs close or popups appear. Tabs can be created with a memorable label via tab new --label <name> [<url>], and labels are interchangeable with t<N> ids everywhere a tab ref is accepted (tab <id|label>, tab close <id|label>). Bare-integer input is rejected with a teaching error so agents can't mistake stable handles for positional indices (#892, #1249, #1250)core skill - Renamed the built-in agent-browser skill to core and replaced its ~40-line discovery stub with a ~420-line usage guide covering the core snapshot-ref-act loop, reading, interacting, waiting, common workflows, troubleshooting, and global flags. agent-browser skills get core now returns content agents can use directly; --full adds references and templates. Added a hidden: frontmatter flag so the original agent-browser stub stays reachable for npx skills add discovery without polluting skills list (#1253)agent-browser.schema.json describing every config option with types and descriptions, enabling IDE autocomplete and validation when referenced via $schema in agent-browser.json or ~/.agent-browser/config.json. The schema is served from the docs site at https://agent-browser.dev/schema.json (#1242, #1248)--state / AGENT_BROWSER_STATE not actually loading saved browser state (cookies and localStorage) at launch. The flag had been fully plumbed through parsing, env propagation, and validation since the native Rust rewrite, but the load step was never wired up. Storage state now loads after launch across all four paths: explicit launch, auto-connect, provider, and local Chrome (#1241)--help output now shows the skills section first so agents discover skills get core (the canonical usage guide) before the core command list (#1251)--auto-connect CDP discovery preferring HTTP endpoint discovery over the DevToolsActivePort websocket path, which could fail on some setups. The CLI now reads the websocket path from DevToolsActivePort first and only falls back to HTTP discovery (#1218)get box and get styles printing no data in text mode (#1231, #1233)skills command - Added agent-browser skills command for discovering and installing agent skills, with built-in evaluation support for testing skills against live browser sessions (#1225, #1227)--ignore-https-errors not being re-applied to recording contexts, causing TLS errors during screen recordings (#1178)<label> wraps a display:none <input type="radio"> or <input type="checkbox">. Chrome excludes these inputs from the accessibility tree entirely, making it impossible for AI agents to identify radio buttons and checkboxes via refs. Hidden inputs inside elements are now detected during cursor-interactive scanning and their parent nodes are promoted to the correct role with proper name and checked state (#1085)PR_SET_PDEATHSIG tracking the blocking thread that spawned Chrome rather than the daemon process. When Tokio reaped the idle thread, the kernel sent SIGKILL to Chrome even though the daemon was still alive (#1157, #1173)rust-embed, eliminating the need for dashboard install. The dashboard is available immediately after installing agent-browser (#1169)chat command for AI-powered browser automation. Supports single-shot mode (chat "open google.com") and an interactive REPL. The AI agent can execute any agent-browser command via tool calls. Requires AI_GATEWAY_API_KEY. Configure the model with --model or AI_GATEWAY_MODEL (#1160, #1163)snapshot --urls - New -u/--urls flag to include href URLs for link elements in snapshot output, giving agents direct access to link targets without additional queries (#1160)batch command now accepts commands as inline arguments in addition to reading from stdin, simplifying single-invocation multi-command workflows (#1160)getByRole matching wrong elements (e.g. <link> stylesheet elements instead of <a> anchors) by rewriting the implementation to use the CDP accessibility tree with ref-based element resolution instead of CSS selectors (#1145)upload command not supporting accessibility tree refs (@eN) for file upload element selection (#1156)AGENT_BROWSER_DEFAULT_TIMEOUT not being applied to wait commands. The environment variable now propagates to all wait variants (wait, wait --url, wait --text, wait --load, wait --fn, wait --download) (#1153)--profile <name> now resolves Chrome profile names (e.g. Default, Profile 1) and copies the profile to a temp directory to reuse login state, cookies, and extensions without modifying the original. Added profiles command to list available Chrome profiles with --json support (#1131)--ignore-https-errors not passing --ignore-certificate-errors as a Chrome launch flag, causing TLS errors like ERR_SSL_PROTOCOL_ERROR to be rejected at the network layer before CDP could intervene (#1132)PR_SET_PDEATHSIG ensures Chrome is killed even if the daemon is OOM-killed (#1137)Runtime.runIfWaitingForDebugger (#1133).version sidecar file and auto-restarts on version mismatch (#1134)close --all failed to clean up zombie daemons and stale files. Unreachable daemons are now force-killed and orphaned socket/pid files are removed (#1136)config.json) between consecutive launch commands (#996)auto_launch() not honouring AGENT_BROWSER_PROVIDER for cloud providers, causing non-launch commands to fall back to local Chrome instead of connecting via the provider API (#1126)--provider agentcore or AGENT_BROWSER_PROVIDER=agentcore. Uses lightweight manual SigV4 signing for authentication with support for the full AWS credential provider chain (environment variables, AWS CLI, SSO, IAM roles). Configure with AGENTCORE_REGION, AGENTCORE_PROFILE_ID, and AGENTCORE_BROWSER_ID environment variables. Returns session ID and Live View URL in the launch response (#397)waitpid(-1) race condition in the SIGCHLD handler that stole exit statuses from Rust's Child handles, leaving the daemon in a broken state. Replaced the global signal handler with targeted crash detection via the existing drain interval (#1098)mouseMoved events during the drag omitted the buttons bitmask, causing the browser to see event.buttons === 0 and never fire dragstart/dragover/drop (#1087)Content-Type header, causing session creation to fail (#1092)wait_for_lifecycle waited for page load events that remote providers may not emit. Navigation with --provider now automatically sets waitUntil=none (#1092)alert() and beforeunload dialogs are now automatically accepted to prevent the agent from blocking indefinitely. confirm and prompt dialogs still require explicit dialog accept/dismiss commands. Disable with --no-auto-dialog flag or AGENT_BROWSER_NO_AUTO_DIALOG environment variable (#1075)~/.cache/puppeteer/chrome/ (or PUPPETEER_CACHE_DIR) for Chrome binaries, so users with an existing Puppeteer installation can use agent-browser without a separate install step (#1088)console.log of objects now shows the actual object preview (e.g. {userId: "abc", count: 42}) instead of "Object". JSON output includes a raw args array for programmatic access (#1040)wait_for_lifecycle waited for a Page.loadEventFired that never fires on same-document navigations (#1059)Network.getAllCookies and collects localStorage from all visited origins (#1064)tab list when using --cdp mode. Tabs opened by the user or another CDP client are now detected and tracked (#1042)dashboard install now takes effect immediately on a running server (#1066)dashboard) that shows live browser viewports, command activity feeds, console output, network requests, storage, and extensions for all sessions. Manage it with dashboard start, dashboard stop, and dashboard install. The dashboard runs as a standalone background process and all sessions stream to it automatically (#1034)agent-browser dashboard install
agent-browser dashboard start
agent-browser open example.com # session appears in dashboard
agent-browser dashboard stop
stream enable, stream disable, and stream status commands to control WebSocket streaming at runtime. Streaming is now always enabled by default; AGENT_BROWSER_STREAM_PORT overrides the port instead of toggling the feature (#951)close --all flag to close every active browser session at once.port file (#1041)ref= check required a leading [ that those elements lack (#1008)Browser.setDownloadBehavior set at launch only applied to the default context. The download behavior is now re-applied when a new recording context is created (#1019)type through text input - Fixed keyboard type subaction to correctly route through the text input handler, and added support for an insertText subaction using Input.insertText (#1014)--clear flag in console command - Fixed the console command to accept and process a clear parameter, allowing console event history to be cleared (#1015)dialog status command to check whether a JavaScript dialog is currently open (#999)warning field when a JavaScript dialog is pending, indicating the dialog type and message (#999)HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, and their lowercase variants), with NO_PROXY/no_proxy respected for bypass rules (#1000)--with-deps - Installing with --with-deps now includes CJK and emoji font packages on Linux (Debian, RPM, and yum-based distros) to prevent missing glyphs when rendering international content (#1002)state show always failing with "Missing 'path' parameter" due to a mismatched JSON field name (#994)console command returning only Done due to a JSON field name mismatch in the response (#986)sessionId mismatch (#998)Fetch.authRequired event rather than passing them inline (#1000)Control+a, Shift+Enter, Control+Shift+a) not being handled correctly when using press. Modifier keys are now parsed and forwarded as CDP modifier bitmasks rather than treated as part of the key name (#980)--cdp HTTP URLs (e.g. http://host:9222?mode=Hello). Query strings are now preserved and forwarded to the remote CDP endpoint (#982)Target.setAutoAttach (#949)network request <requestId> command to view full request/response detail, and new filtering options for network requests including --type (e.g. xhr,fetch), --method (e.g. POST), and --status (e.g. 2xx, 400-499) (#935)agent-browser network requests --type xhr,fetch --method POST
agent-browser network request <requestId>
-C flag unnecessary (#968)find command flags such as --exact and --name leaking into fill values when used with fill actions (#955)session_name is provided (#677, #964)keyDown events (#972)-C / --cursor flag for snapshot is deprecated; cursor-interactive elements are now included by default and the flag has no additional effect (#968)agent-browser auth login now navigates with load, waits for usable login form selectors, and uses staged username detection (targeted email/username selectors first, then broad text-input fallback). This reduces SPA timing failures, avoids false matches on unrelated text fields, and prevents networkidle hangs on pages with continuous background requests.SO_KEEPALIVE to prevent CDP connections from being silently dropped by intermediate proxies during idle periods (#936)xpath= selector prefix (#908)down, move, and up events (#872)--enable-unsafe-swiftshader flag in Chrome headless mode (#915)--headers persistence - Restored correct persistence of origin-scoped headers set via --headers across navigation commands (#894)upgrade command - Added agent-browser upgrade to self-update the CLI; automatically detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command (#898)agent-browser upgrade
batch -- Execute multiple commands in a single invocation. Pipe a JSON array of string arrays to stdin and receive results sequentially. Supports --bail to stop on first error and --json for structured output.echo '[["open","example.com"],["snapshot"]]' | agent-browser batch --json
network har start/stop -- Capture and export network traffic in HAR 1.2 format.agent-browser network har start
# ... interact with the page ...
agent-browser network har stop ./trace.har
--idle-timeout flag -- Automatically shut down the daemon after a period of inactivity. Accepts human-friendly formats such as 10s, 3m, 1h, or raw milliseconds. Also available as AGENT_BROWSER_IDLE_TIMEOUT_MS.--full/-f refactored to command-level flag -- Moved from a global flag to a per-command flag for clearer scoping.--user-data-dir support and configurable launch timeout for more reliable browser startup. Chrome now retries launching up to 3 times on transient startup failures.--auto-connect commands -- Multiple consecutive auto-connect commands no longer require a full browser relaunch; external connections are correctly identified and reused.snapshot -C and screenshot --annotate now batch CDP calls instead of issuing sequential round-trips per element, preventing timeouts on high-latency WSS connections.check/uncheck falling back to JS .click() for overlay-based controlstype commandchrome://, devtools://) from auto-connect discoverysnapshot --selector scoping to the matched element's subtreeunshare)agent-browser is now 100% native Rust. The Node.js/Playwright daemon has been completely removed -- no Node.js runtime or Playwright dependency is required to run the daemon. The Rust native daemon is now the only implementation.
<table> <thead> <tr> <th>Metric</th> <th>Node.js</th> <th>Rust</th> <th></th> </tr> </thead> <tbody> <tr> <td>Cold start</td> <td>1002ms</td> <td>617ms</td> <td>1.6x faster</td> </tr> <tr> <td>Daemon memory</td> <td>143 MB</td> <td>8 MB</td> <td>18x less</td> </tr> <tr> <td>Install size</td> <td>710 MB</td> <td>7 MB</td> <td>99x smaller</td> </tr> </tbody> </table>npm install -g agent-browser # 7 MB install
agent-browser install # download Chrome
agent-browser open example.com
agent-browser snapshot
--headed false flag not being respected in CLIto_ai_friendly_error incorrectly catching non-element errors--provider browserless flag or AGENT_BROWSER_PROVIDER=browserless environment variable.export BROWSERLESS_API_KEY="your-api-key"
agent-browser --provider browserless open example.com
agent-browser --provider browserless screenshot ./page.png
clipboard command -- Read from and write to the browser clipboard. Supports read, write, copy (simulates Ctrl+C), and paste (simulates Ctrl+V) operations.agent-browser clipboard read
agent-browser clipboard write "Hello, World!"
agent-browser clipboard copy
agent-browser clipboard paste
--screenshot-dir, --screenshot-quality, and --screenshot-format. Also available as environment variables AGENT_BROWSER_SCREENSHOT_DIR, AGENT_BROWSER_SCREENSHOT_QUALITY, and AGENT_BROWSER_SCREENSHOT_FORMAT.agent-browser screenshot --screenshot-dir ./shots
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
wait --text not working in native daemon pathBrowserManager.navigate() and package entry pointconfig.jsonbrowser.getLocator() for selector operationsinspect command -- Opens Chrome DevTools for the active page by launching a local proxy server that forwards the DevTools frontend to the browser's CDP WebSocket. Agent commands continue to work while DevTools is open.agent-browser open example.com
agent-browser inspect # opens DevTools in your browser
agent-browser click "Submit" # commands still work while DevTools is open
get cdp-url subcommand -- Retrieve the Chrome DevTools Protocol WebSocket URL for the active page, useful for connecting external debugging tools.agent-browser get cdp-url
--annotate flag overlays numbered labels on interactive elements in screenshots.KERNEL_API_KEY to be set, making it easier to use Kernel with pre-configured environments.BROWSERBASE_PROJECT_ID requirement, reducing setup friction for Browserbase users.toWellFormed()scale parameter.--engine <name> flag to select the browser engine (chrome by default, or lightpanda), implying --native mode. Configurable via AGENT_BROWSER_ENGINE environment variable.agent-browser --engine lightpanda open example.com
dismiss subcommand in dialog command parsing.reqwest for more reliable CDP port discovery.--native, AGENT_BROWSER_NATIVE=1, or "native": true in your config file. Supports 150+ commands with full parity to the default Node.js daemon.# Via flag
agent-browser --native open example.com
# Via environment variable
export AGENT_BROWSER_NATIVE=1
agent-browser open example.com
Or add to agent-browser.json:
{"native": true}
All core commands work in native mode: navigation, interaction (click, fill, type, press, hover, scroll, drag), observation (snapshot, screenshot, eval), state management (cookies, storage, state save/load), tabs, emulation (viewport, device, timezone, locale, geolocation), streaming, diffing, recording, and profiling.
The native daemon also includes a WebDriver backend for Safari and iOS support.
agent-browser close before switching between modes.See the Native Mode page for full details.
auth save, auth login, auth list, auth show, auth delete. Passwords can be piped via stdin (--password-stdin) to avoid shell history exposure.--content-boundaries wraps page-sourced output in structural delimiters with a per-process CSPRNG nonce, so LLMs can distinguish trusted tool output from untrusted page content. In --json mode, a _boundary object is injected with nonce and origin fields.--allowed-domains restricts navigation, sub-resource requests, WebSocket connections, and EventSource streams to trusted domains. Supports exact match and wildcard prefix patterns (e.g., *.example.com).--action-policy gates actions using a static JSON policy file with allow/deny lists across 13 action categories. Auth vault operations bypass policy enforcement.--confirm-actions requires explicit approval for sensitive action categories. New confirm and deny commands for orchestrator use. --confirm-interactive enables human-in-the-loop terminal prompts (auto-denies if stdin is not a TTY). Pending confirmations auto-deny after 60 seconds.--max-output truncates large page outputs to prevent LLM context flooding.--download-path option -- Set a default download directory via flag, AGENT_BROWSER_DOWNLOAD_PATH env var, or downloadPath config key. Without it, downloads go to a temporary directory deleted when the browser closes.--selector flag for scroll -- Scroll within a specific container element instead of the page: agent-browser scroll down 500 --selector "div.scroll-container"# Auth vault
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
agent-browser auth login github
# Security flags
agent-browser --content-boundaries --allowed-domains "example.com,*.example.com" --max-output 50000 open https://example.com
# Download path
agent-browser --download-path ./downloads open https://example.com
# Scroll within container
agent-browser scroll down 500 --selector "div.content"
Six new environment variables for security configuration: AGENT_BROWSER_CONTENT_BOUNDARIES, AGENT_BROWSER_MAX_OUTPUT, AGENT_BROWSER_ALLOWED_DOMAINS, AGENT_BROWSER_ACTION_POLICY, AGENT_BROWSER_CONFIRM_ACTIONS, AGENT_BROWSER_CONFIRM_INTERACTIVE.
keyboard command -- Type with real keystrokes, insert text, and press shortcuts at the currently focused element without needing a selector (keyboard type, keyboard inserttext).--color-scheme flag -- Persistent dark/light mode preference across browser sessions via flag or AGENT_BROWSER_COLOR_SCHEME env var.agent-browser keyboard type "Hello world"
agent-browser keyboard inserttext "pasted text"
agent-browser --color-scheme dark open https://example.com
AGENT_BROWSER_DEFAULT_TIMEOUT).--annotate flag warning appearing when not explicitly passed via CLI.agent-browser diff snapshot
agent-browser diff screenshot --baseline before.png
agent-browser diff url https://staging.example.com https://prod.example.com
--annotate flag overlays numbered labels on interactive elements and prints a legend mapping each label to its element ref. Enables multimodal AI models to reason about visual layout while using the same @eN refs for subsequent interactions. Also settable via AGENT_BROWSER_ANNOTATE env var.agent-browser screenshot --annotate
&& across README, CLI help output, docs, and skill files.~/.agent-browser/config.json) and project (./agent-browser.json) directories with priority-based merging.profiler start and profiler stop.--extension flag to load browser extensions.state save and state load commands for auth state persistence.--device flag for device emulation.--new-tab option for click commands.--cdp now accepts WebSocket URLs in addition to ports.--session-name flag--new-tab option for click commands to open links in new tabs# Persist session state
agent-browser --session-name myapp open https://example.com
# Manage saved states
agent-browser state list
agent-browser state show myapp
agent-browser state clear myapp
--allow-file-access flag - Enable opening and interacting with local file:// URLs (PDFs, HTML files) by passing Chromium flags that allow JavaScript access to local files-C/--cursor flag for snapshots - Include cursor-interactive elements like divs with onclick handlers or cursor:pointer stylesagent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser snapshot -C
# List available iOS simulators
agent-browser device list
# Launch on iOS device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
# Touch interactions
agent-browser tap @e1
agent-browser swipe up
--stdin flag for eval command to read JavaScript from stdin, enabling heredoc usage for multiline scripts--stdin flag for eval command to read JavaScript from stdin-b/--base64 flag to avoid shell escaping issues# Via -p flag
agent-browser -p kernel open https://example.com
# Via environment variable
export AGENT_BROWSER_PROVIDER=kernel
export KERNEL_API_KEY=your-api-key
agent-browser open https://example.com
# With persistent profile
export KERNEL_PROFILE_NAME=my-profile
agent-browser open https://example.com
agent-browser --ignore-https-errors open https://localhost:3000
cookies set command with additional flags for setting cookies before page loadagent-browser cookies set session_id "abc123" --url https://app.example.com --httpOnly --secure
agent-browser cookies set token "xyz" --domain .example.com --path /api --expires 1735689600
target="_blank" linkscheck command hanging indefinitelyset device not applying deviceScaleFactor - HiDPI screenshots now work correctly# Via -p flag (recommended)
agent-browser -p browserbase open https://example.com
agent-browser -p browseruse open https://example.com
# Via environment variable
export AGENT_BROWSER_PROVIDER=browserbase
agent-browser open https://example.com
agent-browser --profile ~/.myapp-profile open myapp.com
# Login persists across restarts
agent-browser --cdp "wss://browser-service.com/cdp?token=..." snapshot
download command - Trigger downloads and wait for completionagent-browser download @e1 ./file.pdf
agent-browser wait --download ./output.zip --timeout 30000
agent-browser --args "--disable-gpu,--no-sandbox" open example.com
agent-browser --user-agent "Custom UA" open example.com
agent-browser --proxy-bypass "localhost,*.internal" open example.com
connect command~/.agent-browser instead of TMPDIR)agent-browser record start ./demo.webm
agent-browser click @e1
agent-browser record stop
connect command - Connect to a browser via CDP and persist the connection for subsequent commandsagent-browser connect 9222
agent-browser snapshot # No --cdp needed after connect
--proxy flag - Configure browser proxy with optional authenticationagent-browser --proxy http://user:[email protected]:8080 open example.com
get styles command - Extract computed styles from elementsagent-browser get styles "button"
.claude-plugin/marketplace.json for Claude Code integrationnetwork requests now shows method, URL, and resource type--version flag - Display CLI versionlibasound2t64 on newer Ubuntu versions (24.04+)get value commandtab new commandabout:, data:, and file: URL schemesAGENT_BROWSER_HEADED environment variablehead/tailThese changes align the CLI with the daemon protocol for consistency:
select command now uses values field (supports multiple selections)frame main uses mainframe actionmouse wheel uses wheel actionset media uses emulatemedia actionmessages fieldconnect workflow