packages/kilo-docs/pages/automate/tools/browser-action.md
The browser_action tool enables web automation and interaction via a Puppeteer-controlled browser. It allows Kilo Code to launch browsers, navigate to websites, click elements, type text, and scroll pages with visual feedback through screenshots.
The tool accepts these parameters:
action (required): The action to perform:
launch: Start a new browser session at a URLclick: Click at specific x,y coordinatestype: Type text via the keyboardscroll_down: Scroll down one page heightscroll_up: Scroll up one page heightclose: End the browser sessionurl (optional): The URL to navigate to when using the launch actioncoordinate (optional): The x,y coordinates for the click action (e.g., "450,300")text (optional): The text to type when using the type actionThis tool creates an automated browser session that Kilo Code can control to navigate websites, interact with elements, and perform tasks that require browser automation. Each action provides a screenshot of the current state, enabling visual verification of the process.
The tool operates in two distinct modes:
browser_action tool can be usedWhen the browser_action tool is invoked, it follows this process:
Action Validation and Browser Management:
launch: Initializes a browser session (either local Puppeteer instance or remote Chrome)close: Terminates or disconnects from the browser appropriatelyPage Interaction and Stability:
waitTillHTMLStable algorithmVisual Feedback:
Session Management:
Browser interactions must follow this specific sequence:
launch actionclick, type, and scroll actions can be performedclose actiontype action, and clicks submit.Launching a browser and navigating to a website:
<browser_action>
<action>launch</action>
<url>https://example.com</url>
</browser_action>
Clicking at specific coordinates (e.g., a button):
<browser_action>
<action>click</action>
<coordinate>450,300</coordinate>
</browser_action>
Typing text into a focused input field:
<browser_action>
<action>type</action>
<text>Hello, World!</text>
</browser_action>
Scrolling down to see more content:
<browser_action>
<action>scroll_down</action>
</browser_action>
Closing the browser session:
<browser_action>
<action>close</action>
</browser_action>