packages/cli/README.md
Browser automation CLI for AI agents. Built on Stagehand, providing raw browser control without requiring LLM integration.
npm install -g @browserbasehq/browse-cli
Requires Chrome/Chromium installed on the system.
# Navigate to a URL (auto-starts browser daemon)
browse open https://example.com
# Take a snapshot to get element refs
browse snapshot -c
# Click an element by ref
browse click @0-5
# Type text
browse type "Hello, world!"
# Take a screenshot
browse screenshot ./page.png
# Stop the browser
browse stop
Browse uses a daemon architecture for fast, stateful interactions:
--session or BROWSE_SESSION env varThe CLI automatically recovers from stale sessions. If the daemon or Chrome crashes:
Agents don't need to handle recovery - commands "just work".
browse open <url> [--wait load|domcontentloaded|networkidle] [-t|--timeout ms]
browse reload
browse back
browse forward
The --timeout flag (default: 30000ms) controls how long to wait for the page load state. Use longer timeouts for slow-loading pages:
browse open https://slow-site.com --timeout 60000
browse click <ref> [-b left|right|middle] [-c count] # Click by ref (e.g., @0-5)
browse click_xy <x> <y> [--button] [--xpath] # Click at coordinates
browse hover <x> <y> [--xpath]
browse scroll <x> <y> <deltaX> <deltaY> [--xpath]
browse drag <fromX> <fromY> <toX> <toY> [--steps n] [--xpath]
browse type <text> [-d delay] [--mistakes]
browse press <key> # e.g., Enter, Tab, Cmd+A
browse fill <selector> <value> [--no-press-enter]
browse select <selector> <values...>
browse highlight <selector> [-d duration]
browse get url
browse get title
browse get text <selector>
browse get html <selector>
browse get value <selector>
browse get box <selector> # Returns center coordinates
browse snapshot [-c|--compact] # Accessibility tree with refs
browse screenshot [path] [-f|--full-page] [-t png|jpeg]
browse wait load [state]
browse wait selector <selector> [-t timeout] [-s visible|hidden|attached|detached]
browse wait timeout <ms>
browse pages # List all tabs
browse newpage [url] # Open new tab
browse tab_switch <n> # Switch to tab by index
browse tab_close [n] # Close tab (default: last)
Capture HTTP requests to the filesystem for inspection:
browse network on # Start capturing requests
browse network off # Stop capturing
browse network path # Get capture directory path
browse network clear # Clear captured requests
Captured requests are saved as directories:
/tmp/browse-default-network/
001-GET-api.github.com-repos/
request.json # method, url, headers, body
response.json # status, headers, body, duration
browse start # Explicitly start daemon
browse stop [--force] # Stop daemon
browse status # Check daemon status
browse env [target] # Show or switch environment: local | remote
Use environment switching when an agent should keep the same command flow, but the browser runtime needs to change:
local runs Chrome on your machine (best for local debugging/dev loops)remote runs a Browserbase session (best for anti-bot hardening and cloud runs)# Show active environment (if running) and desired environment for next start
browse env
# Switch current session to Browserbase (restarts daemon if needed)
browse env remote
# Switch back to local Chrome
browse env local
Behavior details:
--sessionbrowse env <target> persists an override and restarts the daemonbrowse stop clears the override so next start falls back to env-var-based auto detectionremote when BROWSERBASE_API_KEY is setlocal otherwise| Option | Description |
|---|---|
--session <name> | Session name for multiple browsers (default: "default") |
--headless | Run Chrome in headless mode |
--headed | Run Chrome with visible window (default) |
--ws <url> | Connect to existing Chrome via CDP WebSocket |
--json | Output as JSON |
| Variable | Description |
|---|---|
BROWSE_SESSION | Default session name (alternative to --session) |
BROWSERBASE_API_KEY | Browserbase API key (required for browse env remote) |
BROWSERBASE_PROJECT_ID | Browserbase project ID (optional, passed through if set) |
After running browse snapshot, you can reference elements by their ref ID:
# Get snapshot with refs
browse snapshot -c
# Output includes refs like [0-5], [1-2], etc.
# RootWebArea "Example" url="https://example.com"
# [0-0] link "Home"
# [0-1] link "About"
# [0-2] button "Sign In"
# Click using ref (multiple formats supported)
browse click @0-2 # @ prefix
browse click 0-2 # Plain ref
browse click ref=0-2 # Explicit prefix
The full snapshot output includes mappings:
Run multiple browser instances simultaneously:
# Terminal 1
BROWSE_SESSION=session1 browse open https://google.com
# Terminal 2
BROWSE_SESSION=session2 browse open https://github.com
# Or use --session flag
browse --session work open https://slack.com
browse --session personal open https://twitter.com
Connect to an existing Chrome instance:
# Start Chrome with remote debugging
google-chrome --remote-debugging-port=9222
# Connect via WebSocket
browse --ws ws://localhost:9222/devtools/browser/... open https://example.com
@0-5)browse open https://example.com
browse snapshot -c
# [0-5] textbox: Search
# [0-8] button: Submit
browse fill @0-5 "my query"
browse click @0-8
browse snapshot -c # Verify result
browse stop
The CLI uses your system Chrome/Chromium. If not found:
# macOS - Install Chrome or set path
export CHROME_PATH=/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
# Linux - Install chromium
sudo apt install chromium-browser
If the daemon becomes unresponsive:
browse stop --force
# Clean up stale socket files
rm /tmp/browse-*.sock /tmp/browse-*.pid
Windows support requires WSL or TCP socket implementation.
# Clone and setup (in monorepo)
cd packages/cli
pnpm install # Install dependencies first!
pnpm run build # Build the CLI
# Run without building (for development)
pnpm run dev -- <command>
# Or with tsx directly
npx tsx src/index.ts <command>
# Run linting and formatting
pnpm run lint
pnpm run format
MIT - see LICENSE