v3/docs/adr/ADR-175-page-agent-browser-intent-layer.md
plugins/ruflo-browser/docs/adrs/0001-browser-skills-architecture.mdgithub.com/alibaba/page-agent (npm page-agent, MIT)The ruflo-browser plugin drives a real browser through the agent-browser CLI (Playwright under the hood), exposing 23 low-level, selector-based MCP tools (browser_click, browser_fill, browser_type, browser_snapshot with @e1 refs, browser_eval, browser_screenshot, session record/replay → RVF containers + ruvector trajectories + AgentDB selector memory + AIDefence gating). Every action requires a CSS selector or element ref; the model must read a snapshot and orchestrate each step imperatively. There is no natural-language "perform this task" action.
page-agent is the inverse architecture: injected in-page JavaScript that serialises the DOM to text (no screenshots, no vision model, no headless process of its own) and lets an LLM execute a multi-step intent in-page — agent.execute('fill out and submit the checkout form'). It takes an OpenAI-compatible endpoint ({ baseURL, apiKey, model }), ships as an npm package and an injectable IIFE bundle, and is MIT-licensed.
The two are complementary, not competing: the Playwright harness owns navigation, session capture, screenshots, cookies, and gating; page-agent adds the missing intent layer inside the page it controls.
Add an optional natural-language intent layer to the browser plugin: a browser_act MCP tool that injects page-agent into the agent-browser-controlled page, executes an NL task in-page, routes page-agent's LLM through ruflo's own model layer, captures the resulting action trajectory into memory (feeding ADR-174 distillation), and gates the result through AIDefence.
| Layer | Owner | Responsibility |
|---|---|---|
| Outer harness | agent-browser / Playwright (existing) | navigate, session → RVF, screenshots, cookies, AIDefence |
| In-page intent | page-agent (new) | one NL call reasons over DOM-as-text and performs the multi-step action |
page-agent goes in optionalDependencies. browser_act returns { degraded: true, reason, hint } — never throws — when it is absent. The CLI is fully functional without it (the 23 selector tools are unaffected). Removability is part of the contract.{ baseURL, apiKey, model } are resolved from ruflo's existing provider/router configuration (the same OpenAI-compatible surface the rest of the CLI uses), so browser intents are cost-governed. Because page-agent is text-DOM (no vision), it routes naturally to the cheap tier (Haiku/local) — most intents are near-$0.dist/iife/page-agent.demo.js) auto-POSTs page content to Alibaba's public sandbox on inject (https://page-ag-testing-*.<region>.fcapp.run, DEMO_MODEL). We strip that tail, but the strip is best-effort text matching. The load-bearing guarantee is a fail-closed content firewall (findDemoLeak): if any known demo-endpoint signature survives the strip, browser_act refuses to inject and degrades — so an upstream bundle change that moves the marker can never silently re-enable the leak. (No fork required: the library core ESM is clean; the leak lives only in the demo bundle. A fork would mean maintaining a parallel 3-package copy forever to delete a few demo lines — the firewall achieves the security goal at near-zero maintenance.)execute() records the intent + action trajectory as a browser-namespace memory entry (best-effort, never fatal), so browser intents feed the distillation/self-learning loop — over time the SONA/MoE model learns which intent→action sequences succeed. This closes the plugin's existing "AgentDB selector memory" into the same learning loop.browser_act is an added capability, not a replacement.browser_act unlocks several follow-on integrations, sequenced by leverage:
ruflo-testgen / production-validator with execute('log in, add an item, verify the total') instead of manual selector orchestration. Cheapest real win.browser memory → distilled reasoning_patterns → auto-generated browser-* skills. The most strategic (compounds with ADR-174).@metaharness/redblue live-web adversarial tests, scored + human-gated.execute() as a planning operator so ruflo-goals objectives span code and web.ruflo-federation, zero-trust-gated.Remove page-agent from optionalDependencies; browser_act degrades to { degraded: true } and every other browser tool is unaffected. No schema, no persisted state, no change to the 23 selector tools. The trajectory-recording side effect is additive to the browser memory namespace and never mutates existing data.
browser_act MCP tool + optional-dependency wiring + model routing + trajectory capture + AIDefence gating + browser-intent skill + degraded-path tests — implemented alongside this ADR.