Back to Agent Browser

Snapshot and Refs

skill-data/core/references/snapshot-refs.md

0.26.05.3 KB
Original Source

Snapshot and Refs

Compact element references that reduce context usage dramatically for AI agents.

Related: commands.md for full command reference, SKILL.md for quick start.

Contents

How Refs Work

Traditional approach:

Full DOM/HTML → AI parses → CSS selector → Action (~3000-5000 tokens)

agent-browser approach:

Compact snapshot → @refs assigned → Direct interaction (~200-400 tokens)

The Snapshot Command

bash
# Basic snapshot (shows page structure)
agent-browser snapshot

# Interactive snapshot (-i flag) - RECOMMENDED
agent-browser snapshot -i

Snapshot Output Format

Page: Example Site - Home
URL: https://example.com

@e1 [header]
  @e2 [nav]
    @e3 [a] "Home"
    @e4 [a] "Products"
    @e5 [a] "About"
  @e6 [button] "Sign In"

@e7 [main]
  @e8 [h1] "Welcome"
  @e9 [form]
    @e10 [input type="email"] placeholder="Email"
    @e11 [input type="password"] placeholder="Password"
    @e12 [button type="submit"] "Log In"

@e13 [footer]
  @e14 [a] "Privacy Policy"

Using Refs

Once you have refs, interact directly:

bash
# Click the "Sign In" button
agent-browser click @e6

# Fill email input
agent-browser fill @e10 "[email protected]"

# Fill password
agent-browser fill @e11 "password123"

# Submit the form
agent-browser click @e12

Ref Lifecycle

IMPORTANT: Refs are invalidated when the page changes!

bash
# Get initial snapshot
agent-browser snapshot -i
# @e1 [button] "Next"

# Click triggers page change
agent-browser click @e1

# MUST re-snapshot to get new refs!
agent-browser snapshot -i
# @e1 [h1] "Page 2"  ← Different element now!

Best Practices

1. Always Snapshot Before Interacting

bash
# CORRECT
agent-browser open https://example.com
agent-browser snapshot -i          # Get refs first
agent-browser click @e1            # Use ref

# WRONG
agent-browser open https://example.com
agent-browser click @e1            # Ref doesn't exist yet!

2. Re-Snapshot After Navigation

bash
agent-browser click @e5            # Navigates to new page
agent-browser snapshot -i          # Get new refs
agent-browser click @e1            # Use new refs

3. Re-Snapshot After Dynamic Changes

bash
agent-browser click @e1            # Opens dropdown
agent-browser snapshot -i          # See dropdown items
agent-browser click @e7            # Select item

4. Snapshot Specific Regions

For complex pages, snapshot specific areas:

bash
# Snapshot just the form
agent-browser snapshot @e9

Ref Notation Details

@e1 [tag type="value"] "text content" placeholder="hint"
│    │   │             │               │
│    │   │             │               └─ Additional attributes
│    │   │             └─ Visible text
│    │   └─ Key attributes shown
│    └─ HTML tag name
└─ Unique ref ID

Common Patterns

@e1 [button] "Submit"                    # Button with text
@e2 [input type="email"]                 # Email input
@e3 [input type="password"]              # Password input
@e4 [a href="/page"] "Link Text"         # Anchor link
@e5 [select]                             # Dropdown
@e6 [textarea] placeholder="Message"     # Text area
@e7 [div class="modal"]                  # Container (when relevant)
@e8 [img alt="Logo"]                     # Image
@e9 [checkbox] checked                   # Checked checkbox
@e10 [radio] selected                    # Selected radio

Iframes

Snapshots automatically detect and inline iframe content. When the main-frame snapshot runs, each Iframe node is resolved and its child accessibility tree is included directly beneath it in the output. Refs assigned to elements inside iframes carry frame context, so interactions like click, fill, and type work without manually switching frames.

bash
agent-browser snapshot -i
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
#   @e3 [input] "Card number"
#   @e4 [input] "Expiry"
#   @e5 [button] "Pay"
# @e6 [button] "Cancel"

# Interact with iframe elements directly using their refs
agent-browser fill @e3 "4111111111111111"
agent-browser fill @e4 "12/28"
agent-browser click @e5

Key details:

  • Only one level of iframe nesting is expanded (iframes within iframes are not recursed)
  • Cross-origin iframes that block accessibility tree access are silently skipped
  • Empty iframes or iframes with no interactive content are omitted from the output
  • To scope a snapshot to a single iframe, use frame @ref then snapshot -i

Troubleshooting

"Ref not found" Error

bash
# Ref may have changed - re-snapshot
agent-browser snapshot -i

Element Not Visible in Snapshot

bash
# Scroll down to reveal element
agent-browser scroll down 1000
agent-browser snapshot -i

# Or wait for dynamic content
agent-browser wait 1000
agent-browser snapshot -i

Too Many Elements

bash
# Snapshot specific container
agent-browser snapshot @e5

# Or use get text for content-only extraction
agent-browser get text @e5