browser - Browser - Chatgpt On Wechat

Control a Chromium browser for web navigation, element interaction and content extraction. Supports JavaScript-rendered pages and uses a compact DOM snapshot so the Agent can efficiently understand page structure.

Installation

<Tabs> <Tab title="CLI install (recommended)"> ```bash cow install-browser ```

This command will:
- Install the `playwright` Python package (with auto-fallback for older systems)
- Install system dependencies on Linux
- Download the Chromium browser (Linux servers automatically use the headless build)
- Detect China-mainland networks and use mirror acceleration

</Tab> <Tab title="Manual install"> ```bash pip install playwright playwright install chromium ```

On Linux servers, install system dependencies as well:
```bash
sudo playwright install-deps chromium
```

On older systems (e.g. Ubuntu 18.04, glibc < 2.28), install a compatible version:
```bash
pip install playwright==1.28.0
python -m playwright install chromium
```

To accelerate the Chromium download from China:
```bash
export PLAYWRIGHT_DOWNLOAD_HOST=https://registry.npmmirror.com/-/binary/playwright
python -m playwright install chromium
```

</Tab> </Tabs> <Note> 1. Supported on Ubuntu 20.04+, Debian 10+, macOS and Windows. Older systems such as Ubuntu 18.04 will fall back to a compatible version automatically. 2. The browser tool has heavy dependencies (~300MB) and is optional. For lightweight web content retrieval, use the `web_fetch` tool. </Note>

Workflow

A typical browser workflow for the Agent:

navigate — Open the target URL
snapshot — Get a compact DOM with auto-numbered interactive elements (ref)
click / fill / select — Operate elements by ref
snapshot — Snapshot again to verify the result

Supported Actions

Action	Description	Key parameters
`navigate`	Open URL	`url`
`snapshot`	Get structured page text (primary way)	`selector` (optional)
`click`	Click an element	`ref` or `selector`
`fill`	Fill text into an input	`ref` or `selector`, `text`
`select`	Select a dropdown option	`ref` or `selector`, `value`
`scroll`	Scroll the page	`direction` (up/down/left/right)
`screenshot`	Save a screenshot to the workspace	`full_page`
`wait`	Wait for an element or timeout	`selector`, `timeout`
`press`	Press a key (Enter, Tab, etc.)	`key`
`back` / `forward`	Browser back / forward	-
`get_text`	Get an element's text content	`selector`
`evaluate`	Run JavaScript	`script`

Use Cases

Access a URL to retrieve dynamic page content
Fill in forms and log in
Operate web elements (click buttons, select options, etc.)
Verify the result of a deployed web page
Scrape content that requires JS rendering

Run Mode

The browser picks a mode based on the runtime environment:

Environment	Mode
macOS / Windows	Headed (browser window visible)
Linux desktop (with DISPLAY)	Headed
Linux server (no DISPLAY)	Headless

You can override it in config.json:

json

{
  "tools": {
    "browser": {
      "headless": true
    }
  }
}

Log in to a target site once and the Agent can keep using it. Two ways are supported:

Option 1: Persistent mode (default)

Works out of the box. Login state is saved under ~/.cow/browser_profile. No configuration needed.

To disable persistence and start with a clean environment every time:

json

{
  "tools": {
    "browser": {
      "persistent": false
    }
  }
}

Option 2: CDP mode (attach to real Chrome)

Have the Agent connect to a separately launched real Chrome (instead of the Chromium bundled with Playwright) for full browser fingerprints. Useful for sites with strict bot detection.

Launch Chrome with a debugging port and a dedicated user data directory:

<Tabs> <Tab title="macOS"> ```bash "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.cow/chrome-cdp" ``` </Tab> <Tab title="Linux"> ```bash google-chrome \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.cow/chrome-cdp" ``` </Tab> <Tab title="Windows"> ```powershell & "C:\Program Files\Google\Chrome\Application\chrome.exe" ` --remote-debugging-port=9222 ` --user-data-dir="$env:USERPROFILE\.cow\chrome-cdp" ``` </Tab> </Tabs>

Then point the Agent at the endpoint in config.json:

json

{
  "tools": {
    "browser": {
      "cdp_endpoint": "http://localhost:9222"
    }
  }
}

<Note> Chrome 137+ requires `--remote-debugging-port` to be paired with a dedicated `--user-data-dir`. As a result, the CDP-launched Chrome **cannot directly reuse the login state of your daily Chrome**; you'll need to log in once inside this dedicated profile. </Note>

browser - Browser

Installation

Workflow

Supported Actions

Use Cases

Run Mode

Persistent Login

Option 1: Persistent mode (default)

Option 2: CDP mode (attach to real Chrome)