skill-data/core/references/snapshot-refs.md
Compact element references that reduce context usage dramatically for AI agents.
Related: commands.md for full command reference, SKILL.md for quick start.
Traditional approach:
Full DOM/HTML → AI parses → CSS selector → Action (~3000-5000 tokens)
agent-browser approach:
Compact snapshot → @refs assigned → Direct interaction (~200-400 tokens)
# Basic snapshot (shows page structure)
agent-browser snapshot
# Interactive snapshot (-i flag) - RECOMMENDED
agent-browser snapshot -i
Page: Example Site - Home
URL: https://example.com
@e1 [header]
@e2 [nav]
@e3 [a] "Home"
@e4 [a] "Products"
@e5 [a] "About"
@e6 [button] "Sign In"
@e7 [main]
@e8 [h1] "Welcome"
@e9 [form]
@e10 [input type="email"] placeholder="Email"
@e11 [input type="password"] placeholder="Password"
@e12 [button type="submit"] "Log In"
@e13 [footer]
@e14 [a] "Privacy Policy"
Once you have refs, interact directly:
# Click the "Sign In" button
agent-browser click @e6
# Fill email input
agent-browser fill @e10 "[email protected]"
# Fill password
agent-browser fill @e11 "password123"
# Submit the form
agent-browser click @e12
IMPORTANT: Refs are invalidated when the page changes!
# Get initial snapshot
agent-browser snapshot -i
# @e1 [button] "Next"
# Click triggers page change
agent-browser click @e1
# MUST re-snapshot to get new refs!
agent-browser snapshot -i
# @e1 [h1] "Page 2" ← Different element now!
# CORRECT
agent-browser open https://example.com
agent-browser snapshot -i # Get refs first
agent-browser click @e1 # Use ref
# WRONG
agent-browser open https://example.com
agent-browser click @e1 # Ref doesn't exist yet!
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # Get new refs
agent-browser click @e1 # Use new refs
agent-browser click @e1 # Opens dropdown
agent-browser snapshot -i # See dropdown items
agent-browser click @e7 # Select item
For complex pages, snapshot specific areas:
# Snapshot just the form
agent-browser snapshot @e9
@e1 [tag type="value"] "text content" placeholder="hint"
│ │ │ │ │
│ │ │ │ └─ Additional attributes
│ │ │ └─ Visible text
│ │ └─ Key attributes shown
│ └─ HTML tag name
└─ Unique ref ID
@e1 [button] "Submit" # Button with text
@e2 [input type="email"] # Email input
@e3 [input type="password"] # Password input
@e4 [a href="/page"] "Link Text" # Anchor link
@e5 [select] # Dropdown
@e6 [textarea] placeholder="Message" # Text area
@e7 [div class="modal"] # Container (when relevant)
@e8 [img alt="Logo"] # Image
@e9 [checkbox] checked # Checked checkbox
@e10 [radio] selected # Selected radio
Snapshots automatically detect and inline iframe content. When the main-frame snapshot runs, each Iframe node is resolved and its child accessibility tree is included directly beneath it in the output. Refs assigned to elements inside iframes carry frame context, so interactions like click, fill, and type work without manually switching frames.
agent-browser snapshot -i
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
# @e3 [input] "Card number"
# @e4 [input] "Expiry"
# @e5 [button] "Pay"
# @e6 [button] "Cancel"
# Interact with iframe elements directly using their refs
agent-browser fill @e3 "4111111111111111"
agent-browser fill @e4 "12/28"
agent-browser click @e5
Key details:
frame @ref then snapshot -i# Ref may have changed - re-snapshot
agent-browser snapshot -i
# Scroll down to reveal element
agent-browser scroll down 1000
agent-browser snapshot -i
# Or wait for dynamic content
agent-browser wait 1000
agent-browser snapshot -i
# Snapshot specific container
agent-browser snapshot @e5
# Or use get text for content-only extraction
agent-browser get text @e5