Back to Chatgpt On Wechat

browser - Browser

docs/en/tools/browser.mdx

2.0.94.9 KB
Original Source

Control a Chromium browser for web navigation, element interaction and content extraction. Supports JavaScript-rendered pages and uses a compact DOM snapshot so the Agent can efficiently understand page structure.

Installation

<Tabs> <Tab title="CLI install (recommended)"> ```bash cow install-browser ```
This command will:
- Install the `playwright` Python package (with auto-fallback for older systems)
- Install system dependencies on Linux
- Download the Chromium browser (Linux servers automatically use the headless build)
- Detect China-mainland networks and use mirror acceleration
</Tab> <Tab title="Manual install"> ```bash pip install playwright playwright install chromium ```
On Linux servers, install system dependencies as well:
```bash
sudo playwright install-deps chromium
```

On older systems (e.g. Ubuntu 18.04, glibc < 2.28), install a compatible version:
```bash
pip install playwright==1.28.0
python -m playwright install chromium
```

To accelerate the Chromium download from China:
```bash
export PLAYWRIGHT_DOWNLOAD_HOST=https://registry.npmmirror.com/-/binary/playwright
python -m playwright install chromium
```
</Tab> </Tabs> <Note> 1. Supported on Ubuntu 20.04+, Debian 10+, macOS and Windows. Older systems such as Ubuntu 18.04 will fall back to a compatible version automatically. 2. The browser tool has heavy dependencies (~300MB) and is optional. For lightweight web content retrieval, use the `web_fetch` tool. </Note>

Workflow

A typical browser workflow for the Agent:

  1. navigate — Open the target URL
  2. snapshot — Get a compact DOM with auto-numbered interactive elements (ref)
  3. click / fill / select — Operate elements by ref
  4. snapshot — Snapshot again to verify the result

Supported Actions

ActionDescriptionKey parameters
navigateOpen URLurl
snapshotGet structured page text (primary way)selector (optional)
clickClick an elementref or selector
fillFill text into an inputref or selector, text
selectSelect a dropdown optionref or selector, value
scrollScroll the pagedirection (up/down/left/right)
screenshotSave a screenshot to the workspacefull_page
waitWait for an element or timeoutselector, timeout
pressPress a key (Enter, Tab, etc.)key
back / forwardBrowser back / forward-
get_textGet an element's text contentselector
evaluateRun JavaScriptscript

Use Cases

  • Access a URL to retrieve dynamic page content
  • Fill in forms and log in
  • Operate web elements (click buttons, select options, etc.)
  • Verify the result of a deployed web page
  • Scrape content that requires JS rendering

Run Mode

The browser picks a mode based on the runtime environment:

EnvironmentMode
macOS / WindowsHeaded (browser window visible)
Linux desktop (with DISPLAY)Headed
Linux server (no DISPLAY)Headless

You can override it in config.json:

json
{
  "tools": {
    "browser": {
      "headless": true
    }
  }
}

Persistent Login

Log in to a target site once and the Agent can keep using it. Two ways are supported:

Option 1: Persistent mode (default)

Works out of the box. Login state is saved under ~/.cow/browser_profile. No configuration needed.

To disable persistence and start with a clean environment every time:

json
{
  "tools": {
    "browser": {
      "persistent": false
    }
  }
}

Option 2: CDP mode (attach to real Chrome)

Have the Agent connect to a separately launched real Chrome (instead of the Chromium bundled with Playwright) for full browser fingerprints. Useful for sites with strict bot detection.

Launch Chrome with a debugging port and a dedicated user data directory:

<Tabs> <Tab title="macOS"> ```bash "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.cow/chrome-cdp" ``` </Tab> <Tab title="Linux"> ```bash google-chrome \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.cow/chrome-cdp" ``` </Tab> <Tab title="Windows"> ```powershell & "C:\Program Files\Google\Chrome\Application\chrome.exe" ` --remote-debugging-port=9222 ` --user-data-dir="$env:USERPROFILE\.cow\chrome-cdp" ``` </Tab> </Tabs>

Then point the Agent at the endpoint in config.json:

json
{
  "tools": {
    "browser": {
      "cdp_endpoint": "http://localhost:9222"
    }
  }
}
<Note> Chrome 137+ requires `--remote-debugging-port` to be paired with a dedicated `--user-data-dir`. As a result, the CDP-launched Chrome **cannot directly reuse the login state of your daily Chrome**; you'll need to log in once inside this dedicated profile. </Note>