skills/open-source/references/agent.md
from browser_use import Agent, ChatBrowserUse
agent = Agent(
task="Search for latest news about AI",
llm=ChatBrowserUse(),
)
async def main():
history = await agent.run(max_steps=500)
task: The task to automatellm: LLM instance (see models.md)max_steps (default: 500): Maximum agent stepstools: Registry of tools the agent can callskills (or skill_ids): List of skill IDs to load (e.g., ['skill-uuid'] or ['*'] for all). Requires BROWSER_USE_API_KEYbrowser: Browser object for browser settingsoutput_model_schema: Pydantic model class for structured output validationuse_vision (default: True): True always includes screenshots, "auto" includes screenshot tool but only uses vision when requested, False nevervision_detail_level (default: 'auto'): 'low', 'high', or 'auto'page_extraction_llm: Separate LLM for page content extraction (default: same as llm)fallback_llm: Backup LLM when primary fails. Primary exhausts its retry logic (5 attempts with exponential backoff) first. Triggers on: 429 (rate limit), 401 (auth), 402 (payment), 500/502/503/504 (server errors). Once switched, fallback is used for rest of run.initial_actions: Actions to run before main task without LLMmax_actions_per_step (default: 5): Max actions per step (e.g., fill 5 form fields at once)max_failures (default: 5): Max retries for steps with errorsfinal_response_after_failure (default: True): Force one final model call after max_failuresuse_thinking (default: True): Enable explicit reasoning stepsflash_mode (default: False): Fast mode — skips evaluation, next goal, thinking; uses memory only. Overrides use_thinkingoverride_system_message: Completely replace default system promptextend_system_message: Add instructions to default system promptsave_conversation_path: Path to save conversation historysave_conversation_path_encoding (default: 'utf-8')available_file_paths: File paths the agent can accesssensitive_data: Dict of sensitive data (see examples.md for patterns)generate_gif (default: False): Generate GIF of actions. Set to True or string pathinclude_attributes: HTML attributes to include in page analysismax_history_items: Max steps to keep in LLM memory (None = all)llm_timeout (default: auto-detected per model — Groq: 30s, Gemini: 75s, Gemini 3 Pro: 90s, o3/Claude/DeepSeek: 90s, others: 75s): Seconds for LLM callsstep_timeout (default: 180): Seconds for each stepdirectly_open_url (default: True): Auto-open URLs detected in taskcalculate_cost (default: False): Track API costs (access via history.usage)display_files_in_done_text (default: True)controller → alias for toolsbrowser_session → alias for browserrun() returns an AgentHistoryList:
history = await agent.run()
# Basic access
history.urls() # Visited URLs
history.screenshot_paths() # Screenshot file paths
history.screenshots() # Screenshots as base64
history.action_names() # Executed action names
history.extracted_content() # Extracted content from all actions
history.errors() # Errors (None for clean steps)
history.model_actions() # All actions with parameters
history.model_outputs() # All model outputs
history.last_action() # Last action
# Analysis
history.final_result() # Final extracted content (last step)
history.is_done() # Agent completed?
history.is_successful() # Completed successfully? (None if not done)
history.has_errors() # Any errors?
history.model_thoughts() # Reasoning (AgentBrain objects)
history.action_results() # All ActionResult objects
history.action_history() # Truncated action history
history.number_of_steps() # Step count
history.total_duration_seconds() # Total duration
# Structured output
history.structured_output # Parsed structured output (if output_model_schema set)
Use output_model_schema with a Pydantic model:
from pydantic import BaseModel
class SearchResult(BaseModel):
title: str
url: str
agent = Agent(task="...", llm=llm, output_model_schema=SearchResult)
history = await agent.run()
result = history.structured_output # SearchResult instance
# Good
task = """
1. Go to https://quotes.toscrape.com/
2. Use extract action with the query "first 3 quotes with their authors"
3. Save results to quotes.csv using write_file action
"""
# Bad
task = "Go to web and make money"
task = """
1. Use search action to find "Python tutorials"
2. Use click to open first result in a new tab
3. Use scroll action to scroll down 2 pages
4. Use extract to extract the names of the first 5 items
"""
task = """
If the submit button cannot be clicked:
1. Use send_keys action with "Tab Tab Enter"
2. Or use send_keys with "ArrowDown ArrowDown Enter"
"""
@tools.action("Get 2FA code from authenticator app")
async def get_2fa_code():
pass
task = """
Login with 2FA:
1. Enter username/password
2. When prompted for 2FA, use get_2fa_code action
3. NEVER try to extract 2FA codes from the page manually
"""
task = """
1. Go to openai.com to find their CEO
2. If navigation fails due to anti-bot protection:
- Use google search to find the CEO
3. If page times out, use go_back and try alternative approach
"""
Two hooks available via agent.run():
| Hook | When Called |
|---|---|
on_step_start | Before agent processes current state |
on_step_end | After agent executes all actions for step |
async def my_hook(agent: Agent):
state = await agent.browser_session.get_browser_state_summary()
print(f'Current URL: {state.url}')
await agent.run(on_step_start=my_hook, on_step_end=my_hook)
Full access to Agent instance:
agent.task — current task; agent.add_new_task(...) — queue new taskagent.tools — Tools() object and Registry
agent.tools.registry.execute_action('click', {'index': 123}, browser_session=agent.browser_session)agent.sensitive_data — sensitive data dict (mutable)agent.settings — all config optionsagent.llm — direct LLM accessagent.state — internal state (thoughts, outputs, actions)agent.history — execution history:
.model_thoughts(), .model_outputs(), .model_actions().extracted_content(), .urls()agent.browser_session — BrowserSession + CDP:
.agent_focus_target_id — current target ID.get_or_create_cdp_session() — CDP session.get_tabs(), .get_current_page_url(), .get_current_page_title()agent.pause() / agent.resume() — control executionasync def my_hook(agent: Agent):
cdp_session = await agent.browser_session.get_or_create_cdp_session()
doc = await cdp_session.cdp_client.send.DOM.getDocument(session_id=cdp_session.session_id)
html = await cdp_session.cdp_client.send.DOM.getOuterHTML(
params={'nodeId': doc['root']['nodeId']}, session_id=cdp_session.session_id
)
Tips:
step_timeout if hooks take longFine-tune timeouts via environment variables (values in seconds):
| Variable | Default |
|---|---|
TIMEOUT_NavigateToUrlEvent | 30.0 |
TIMEOUT_ClickElementEvent | 15.0 |
TIMEOUT_ClickCoordinateEvent | 15.0 |
TIMEOUT_TypeTextEvent | 60.0 |
TIMEOUT_ScrollEvent | 8.0 |
TIMEOUT_ScrollToTextEvent | 15.0 |
TIMEOUT_SendKeysEvent | 60.0 |
TIMEOUT_UploadFileEvent | 30.0 |
TIMEOUT_GetDropdownOptionsEvent | 15.0 |
TIMEOUT_SelectDropdownOptionEvent | 8.0 |
TIMEOUT_GoBackEvent | 15.0 |
TIMEOUT_GoForwardEvent | 15.0 |
TIMEOUT_RefreshEvent | 15.0 |
TIMEOUT_WaitEvent | 60.0 |
TIMEOUT_ScreenshotEvent | 15.0 |
TIMEOUT_BrowserStateRequestEvent | 30.0 |
| Variable | Default |
|---|---|
TIMEOUT_BrowserStartEvent | 30.0 |
TIMEOUT_BrowserStopEvent | 45.0 |
TIMEOUT_BrowserLaunchEvent | 30.0 |
TIMEOUT_BrowserKillEvent | 30.0 |
TIMEOUT_BrowserConnectedEvent | 30.0 |
| Variable | Default |
|---|---|
TIMEOUT_SwitchTabEvent | 10.0 |
TIMEOUT_CloseTabEvent | 10.0 |
TIMEOUT_TabCreatedEvent | 30.0 |
TIMEOUT_TabClosedEvent | 10.0 |
| Variable | Default |
|---|---|
TIMEOUT_SaveStorageStateEvent | 45.0 |
TIMEOUT_LoadStorageStateEvent | 45.0 |
TIMEOUT_FileDownloadedEvent | 30.0 |