Browse CLI

Browser automation CLI for AI agents. Built on Stagehand, providing raw browser control without requiring LLM integration.

Installation

bash

npm install -g @browserbasehq/browse-cli

Requires Chrome/Chromium installed on the system.

Quick Start

bash

# Navigate to a URL (auto-starts browser daemon)
browse open https://example.com

# Take a snapshot to get element refs
browse snapshot -c

# Click an element by ref
browse click @0-5

# Type text
browse type "Hello, world!"

# Take a screenshot
browse screenshot ./page.png

# Stop the browser
browse stop

How It Works

Browse uses a daemon architecture for fast, stateful interactions:

First command auto-starts a Chrome browser daemon
Subsequent commands reuse the same browser session
State persists between commands (cookies, refs, etc.)
Multiple sessions supported via --session or BROWSE_SESSION env var

Self-Healing Sessions

The CLI automatically recovers from stale sessions. If the daemon or Chrome crashes:

Detects the failure
Cleans up stale processes and files
Restarts the daemon
Retries the command

Agents don't need to handle recovery - commands "just work".

Commands

bash

browse open <url> [--wait load|domcontentloaded|networkidle] [-t|--timeout ms]
browse reload
browse back
browse forward

The --timeout flag (default: 30000ms) controls how long to wait for the page load state. Use longer timeouts for slow-loading pages:

bash

browse open https://slow-site.com --timeout 60000

Click Actions

bash

browse click <ref> [-b left|right|middle] [-c count]  # Click by ref (e.g., @0-5)
browse click_xy <x> <y> [--button] [--xpath]          # Click at coordinates

Coordinate Actions

bash

browse hover <x> <y> [--xpath]
browse scroll <x> <y> <deltaX> <deltaY> [--xpath]
browse drag <fromX> <fromY> <toX> <toY> [--steps n] [--xpath]

Keyboard

bash

browse type <text> [-d delay] [--mistakes]
browse press <key>  # e.g., Enter, Tab, Cmd+A

Forms

bash

browse fill <selector> <value> [--no-press-enter]
browse select <selector> <values...>
browse highlight <selector> [-d duration]

Page Info

bash

browse get url
browse get title
browse get text <selector>
browse get html <selector>
browse get value <selector>
browse get box <selector>  # Returns center coordinates

browse snapshot [-c|--compact]  # Accessibility tree with refs
browse screenshot [path] [-f|--full-page] [-t png|jpeg]

Waiting

bash

browse wait load [state]
browse wait selector <selector> [-t timeout] [-s visible|hidden|attached|detached]
browse wait timeout <ms>

Multi-Tab

bash

browse pages          # List all tabs
browse newpage [url]  # Open new tab
browse tab_switch <n> # Switch to tab by index
browse tab_close [n]  # Close tab (default: last)

Network Capture

Capture HTTP requests to the filesystem for inspection:

bash

browse network on     # Start capturing requests
browse network off    # Stop capturing
browse network path   # Get capture directory path
browse network clear  # Clear captured requests

Captured requests are saved as directories:

/tmp/browse-default-network/
  001-GET-api.github.com-repos/
    request.json      # method, url, headers, body
    response.json     # status, headers, body, duration

Daemon Control

bash

browse start          # Explicitly start daemon
browse stop [--force] # Stop daemon
browse status         # Check daemon status
browse env [target]   # Show or switch environment: local | remote

Environment Switching (Local vs Remote)

Use environment switching when an agent should keep the same command flow, but the browser runtime needs to change:

local runs Chrome on your machine (best for local debugging/dev loops)
remote runs a Browserbase session (best for anti-bot hardening and cloud runs)

bash

# Show active environment (if running) and desired environment for next start
browse env

# Switch current session to Browserbase (restarts daemon if needed)
browse env remote

# Switch back to local Chrome
browse env local

Behavior details:

Environment is scoped per --session
browse env <target> persists an override and restarts the daemon
browse stop clears the override so next start falls back to env-var-based auto detection
Auto detection defaults to:
- remote when BROWSERBASE_API_KEY is set
- local otherwise

Global Options

Option	Description
`--session <name>`	Session name for multiple browsers (default: "default")
`--headless`	Run Chrome in headless mode
`--headed`	Run Chrome with visible window (default)
`--ws <url>`	Connect to existing Chrome via CDP WebSocket
`--json`	Output as JSON

Environment Variables

Variable	Description
`BROWSE_SESSION`	Default session name (alternative to `--session`)
`BROWSERBASE_API_KEY`	Browserbase API key (required for `browse env remote`)
`BROWSERBASE_PROJECT_ID`	Browserbase project ID (optional, passed through if set)

Element References

After running browse snapshot, you can reference elements by their ref ID:

bash

# Get snapshot with refs
browse snapshot -c

# Output includes refs like [0-5], [1-2], etc.
# RootWebArea "Example" url="https://example.com"
#   [0-0] link "Home"
#   [0-1] link "About"
#   [0-2] button "Sign In"

# Click using ref (multiple formats supported)
browse click @0-2       # @ prefix
browse click 0-2        # Plain ref
browse click ref=0-2    # Explicit prefix

The full snapshot output includes mappings:

xpathMap: Cross-frame XPath selectors
cssMap: Fast CSS selectors when available
urlMap: Extracted URLs from links

Multiple Sessions

Run multiple browser instances simultaneously:

bash

# Terminal 1
BROWSE_SESSION=session1 browse open https://google.com

# Terminal 2
BROWSE_SESSION=session2 browse open https://github.com

# Or use --session flag
browse --session work open https://slack.com
browse --session personal open https://twitter.com

Direct CDP Connection

Connect to an existing Chrome instance:

bash

# Start Chrome with remote debugging
google-chrome --remote-debugging-port=9222

# Connect via WebSocket
browse --ws ws://localhost:9222/devtools/browser/... open https://example.com

Optimal AI Workflow

Navigate to target page (browser auto-starts)
Snapshot to get the accessibility tree with refs
Click/Fill using refs directly (e.g., @0-5)
Re-snapshot after actions to verify state changes
Stop when done

bash

browse open https://example.com
browse snapshot -c
# [0-5] textbox: Search
# [0-8] button: Submit
browse fill @0-5 "my query"
browse click @0-8
browse snapshot -c  # Verify result
browse stop

Troubleshooting

Chrome not found

The CLI uses your system Chrome/Chromium. If not found:

bash

# macOS - Install Chrome or set path
export CHROME_PATH=/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome

# Linux - Install chromium
sudo apt install chromium-browser

Stale daemon

If the daemon becomes unresponsive:

bash

browse stop --force

Permission denied on socket

bash

# Clean up stale socket files
rm /tmp/browse-*.sock /tmp/browse-*.pid

Platform Support

macOS (Intel and Apple Silicon)
Linux (x64 and arm64)

Windows support requires WSL or TCP socket implementation.

Development

bash

# Clone and setup (in monorepo)
cd packages/cli
pnpm install         # Install dependencies first!
pnpm run build       # Build the CLI

# Run without building (for development)
pnpm run dev -- <command>

# Or with tsx directly
npx tsx src/index.ts <command>

# Run linting and formatting
pnpm run lint
pnpm run format

License

MIT - see LICENSE

Stagehand - AI web browser automation framework
Browserbase - Cloud browser infrastructure

Browse CLI

Browse CLI

Installation

Quick Start

How It Works

Self-Healing Sessions

Commands

Navigation

Click Actions

Coordinate Actions

Keyboard

Forms

Page Info

Waiting

Multi-Tab

Network Capture

Daemon Control

Environment Switching (Local vs Remote)

Global Options

Environment Variables

Element References

Multiple Sessions

Direct CDP Connection

Optimal AI Workflow

Troubleshooting

Chrome not found

Stale daemon

Permission denied on socket

Platform Support

Development

License

Related