browser-use-demo/README.md
A complete reference implementation for building browser automation with Claude using Playwright. This demo provides a containerized Streamlit interface showcasing how to give Claude the ability to navigate websites, interact with DOM elements, extract content, and fill forms.
This demo implements a custom browser tool that enables Claude to interact with web browsers. It provides:
ref parameter works across different screen sizes and layouts, unlike pixel coordinates that break when windows resizeClone the repository:
git clone https://github.com/anthropics/claude-quickstarts.git
cd claude-quickstarts/browser-use-demo
Configure environment:
cp .env.example .env
# Edit .env file and add your ANTHROPIC_API_KEY
The display resolution is set to 1920x1080 (16:9) for optimal coordinate accuracy.
.env.example for more options and coordinate scaling details# For production use:
docker-compose up --build
# For development with file watching (auto-sync changes):
docker-compose up --build --watch
https://github.com/user-attachments/assets/4fb72078-6902-4b63-bcd1-5f2c4cd60582
Once the demo is running, try these prompts in the Streamlit interface:
Note that the current Playwright implementation hits CAPTCHAs when searching Google.com. To avoid this, we recommend that you specify the website in the prompt (ie. navigate to Anthropic.com and search for x).
Browser automation poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using the tool to interact with the internet. To minimize risks, consider taking precautions such as:
In some circumstances, Claude will follow commands found in content even if it conflicts with the user's instructions. For example, instructions on webpages or contained in images may override user instructions or cause Claude to make mistakes. We suggest taking precautions to isolate Claude from sensitive data and actions to avoid risks related to prompt injection.
Finally, please inform end users of relevant risks and obtain their consent prior to enabling browser automation in your own products.
This demo runs a browser in a containerized environment. While isolated, please note:
This demo shows how to build browser automation with Claude using Playwright. All browser actions (navigate, click, type, scroll, form_input, etc.) are implemented as methods in browser.py using Playwright's async API.
Element references: JavaScript utilities generate ref identifiers for reliable element targeting across screen sizes (replacing brittle pixel coordinates).
Tool setup:
browser_tool = BrowserTool()
def to_params(self):
return {
"name": "browser",
"description": BROWSER_TOOL_DESCRIPTION,
"input_schema": BROWSER_TOOL_INPUT_SCHEMA,
}
The browser tools implementation includes automatic coordinate scaling to ensure accurate interactions:
How it works:
See browser_use_demo/tools/coordinate_scaling.py for the implementation.
This demo uses a custom tool definition with an explicit input schema, giving you full control over the tool interface. The BROWSER_TOOL_DESCRIPTION and BROWSER_TOOL_INPUT_SCHEMA constants in browser.py provide a complete example you can use as a starting point for your own browser automation tools.
To modify this demo:
browser_use_demo/tools/browser.py to add features or change behaviorTo use as a template for your own project:
┌──────────────────────────────────┐
│ Docker Container │
│ │
│ ┌─────────────────────────────┐ │
│ │ Streamlit Interface │ │ ← User interacts here
│ └──────────┬──────────────────┘ │
│ │ │
│ ┌──────────▼──────────────────┐ │
│ │ Claude API + Browser Tool │ │ ← Claude controls browser
│ └──────────┬──────────────────┘ │
│ │ │
│ ┌──────────▼──────────────────┐ │
│ │ Playwright + Chromium │ │ ← Browser automation
│ └──────────┬──────────────────┘ │
│ │ │
│ ┌──────────▼──────────────────┐ │
│ │ XVFB Virtual Display │ │ ← Virtual display
│ └──────────┬──────────────────┘ │
│ │ │
│ ┌──────────▼──────────────────┐ │
│ │ VNC/NoVNC Server │ │ ← Visual access
│ └─────────────────────────────┘ │
└──────────────────────────────────┘
This browser automation demo is specifically optimized for web automation with DOM-aware features like element targeting, page reading, and form manipulation. While it shares many capabilities with the computer use demo, browser automation adds web-specific actions and the ability to target elements by reference instead of just coordinates. Computer use provides general desktop control for any application, while browser automation focuses exclusively on browser-based tasks.
These web-specific actions are not available in computer use:
These actions work similarly to their computer use counterparts. The key difference is that browser automation allows targeting by element reference (ref) as an alternative to coordinates:
Mouse Actions (accept either ref or coordinate):
Keyboard Actions:
Other:
These desktop-level actions from computer use are not in this browser demo:
This is less relevant for browser automation since the ref parameter provides reliable element-based targeting, replacing the need for cursor tracking. Note that hover provides similar functionality to mouse_move for triggering hover states.
Browser not visible?
API errors?
Browser actions failing?
This software includes components from Microsoft Playwright. See the NOTICE file for details.
Built with: