docs/en/tools/browser.mdx
Control a Chromium browser for web navigation, element interaction and content extraction. Supports JavaScript-rendered pages and uses a compact DOM snapshot so the Agent can efficiently understand page structure.
This command will:
- Install the `playwright` Python package (with auto-fallback for older systems)
- Install system dependencies on Linux
- Download the Chromium browser (Linux servers automatically use the headless build)
- Detect China-mainland networks and use mirror acceleration
On Linux servers, install system dependencies as well:
```bash
sudo playwright install-deps chromium
```
On older systems (e.g. Ubuntu 18.04, glibc < 2.28), install a compatible version:
```bash
pip install playwright==1.28.0
python -m playwright install chromium
```
To accelerate the Chromium download from China:
```bash
export PLAYWRIGHT_DOWNLOAD_HOST=https://registry.npmmirror.com/-/binary/playwright
python -m playwright install chromium
```
A typical browser workflow for the Agent:
navigate — Open the target URLsnapshot — Get a compact DOM with auto-numbered interactive elements (ref)click / fill / select — Operate elements by refsnapshot — Snapshot again to verify the result| Action | Description | Key parameters |
|---|---|---|
navigate | Open URL | url |
snapshot | Get structured page text (primary way) | selector (optional) |
click | Click an element | ref or selector |
fill | Fill text into an input | ref or selector, text |
select | Select a dropdown option | ref or selector, value |
scroll | Scroll the page | direction (up/down/left/right) |
screenshot | Save a screenshot to the workspace | full_page |
wait | Wait for an element or timeout | selector, timeout |
press | Press a key (Enter, Tab, etc.) | key |
back / forward | Browser back / forward | - |
get_text | Get an element's text content | selector |
evaluate | Run JavaScript | script |
The browser picks a mode based on the runtime environment:
| Environment | Mode |
|---|---|
| macOS / Windows | Headed (browser window visible) |
| Linux desktop (with DISPLAY) | Headed |
| Linux server (no DISPLAY) | Headless |
You can override it in config.json:
{
"tools": {
"browser": {
"headless": true
}
}
}
Log in to a target site once and the Agent can keep using it. Two ways are supported:
Works out of the box. Login state is saved under ~/.cow/browser_profile. No configuration needed.
To disable persistence and start with a clean environment every time:
{
"tools": {
"browser": {
"persistent": false
}
}
}
Have the Agent connect to a separately launched real Chrome (instead of the Chromium bundled with Playwright) for full browser fingerprints. Useful for sites with strict bot detection.
Launch Chrome with a debugging port and a dedicated user data directory:
<Tabs> <Tab title="macOS"> ```bash "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.cow/chrome-cdp" ``` </Tab> <Tab title="Linux"> ```bash google-chrome \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.cow/chrome-cdp" ``` </Tab> <Tab title="Windows"> ```powershell & "C:\Program Files\Google\Chrome\Application\chrome.exe" ` --remote-debugging-port=9222 ` --user-data-dir="$env:USERPROFILE\.cow\chrome-cdp" ``` </Tab> </Tabs>Then point the Agent at the endpoint in config.json:
{
"tools": {
"browser": {
"cdp_endpoint": "http://localhost:9222"
}
}
}