docs/src/app/snapshots/page.mdx
The snapshot command returns a compact accessibility tree with refs for element interaction.
Filter output to reduce size:
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -c # Compact (remove empty elements)
agent-browser snapshot -d 3 # Limit depth to 3 levels
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options
The default text output is compact and AI-friendly:
agent-browser snapshot -i
# Output:
# @e1 [heading] "Example Domain" [level=1]
# @e2 [button] "Submit"
# @e3 [input type="email"] placeholder="Email"
# @e4 [link] "Learn more"
Refs from the snapshot map directly to commands:
agent-browser click @e2 # Click the Submit button
agent-browser fill @e3 "[email protected]" # Fill the email input
agent-browser get text @e1 # Get heading text
Refs are invalidated when the page changes. Always re-snapshot after navigation or DOM updates:
agent-browser click @e4 # Navigates to new page
agent-browser snapshot -i # Get fresh refs
agent-browser click @e1 # Use new refs
For visual context alongside text snapshots, use screenshot --annotate to overlay numbered labels on interactive elements. Each label [N] maps to ref @eN:
In native mode, annotated screenshots currently work on the CDP-backed browser path (Chromium/Lightpanda). The Safari/WebDriver backend does not yet support --annotate.
agent-browser screenshot --annotate ./page.png
# -> Screenshot saved to ./page.png
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
# [3] @e3 textbox "Email"
agent-browser click @e2
Annotated screenshots also cache refs, so you can interact with elements immediately. This is useful when the text snapshot is insufficient -- unlabeled icons, canvas content, or visual layout verification.
Snapshots automatically detect and inline iframe content. Each Iframe node in the main frame is resolved and its child accessibility tree is included directly beneath it. Refs assigned to elements inside iframes carry frame context, so interactions work without switching frames first.
agent-browser snapshot -i
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
# @e3 [input] "Card number"
# @e4 [button] "Pay"
agent-browser fill @e3 "4111111111111111"
agent-browser click @e4
Only one level of iframe nesting is expanded. Cross-origin iframes that block accessibility tree access and empty iframes are silently omitted.
To scope a snapshot to a single iframe, switch into it first:
agent-browser frame @e2
agent-browser snapshot -i # Only elements inside that iframe
agent-browser frame main # Return to main frame
-i to reduce output to actionable elements-s for specific page sections-d to limit depth on complex pagesscreenshot --annotate when visual context is needed alongside refsFor programmatic parsing in scripts:
agent-browser snapshot --json
# {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
Note: JSON uses more tokens than text output. The default text format is preferred for AI agents.