docs/plans/2026-03-12-page-agent-design.md
This document describes the page agent as currently implemented. It is the durable design reference for the in-page agent and intentionally omits earlier phase-by-phase implementation planning.
Provide an app-wide AI assistant inside the Bytebase console that can:
The page agent complements the existing SQL-editor AI experience; it does not replace it.
The implementation has four main parts.
The floating agent window is mounted from frontend/src/layouts/BodyLayout.vue, so it is available across the dashboard instead of being tied to a single page.
Entry points:
frontend/src/layouts/BodyLayout.vue mounts AgentWindow and registers the keyboard shortcut.frontend/src/views/DashboardHeader.vue exposes the header toggle button.frontend/src/plugins/agent/index.ts defines the Ctrl/Cmd + Shift + A shortcut.frontend/src/plugins/agent/store/agent.ts stores UI state and conversation state.The Pinia store persists the conversation history plus the window layout state used by the floating window (position and size) in localStorage.
The agent loop runs in the frontend in frontend/src/plugins/agent/logic/agentLoop.ts.
High-level flow:
AIService.Chat.This keeps page-aware operations in the frontend while the backend remains the provider-facing proxy.
The implemented tool set is six tools:
search_apicall_apinavigateget_page_statedom_actionget_skillTool definitions live in frontend/src/plugins/agent/logic/tools/index.ts, and tool execution is local to the page agent runtime.
The backend contract is defined in proto/v1/v1/ai_service.proto, with backend handling in backend/api/v1/ai_service.go.
The backend is the source of truth for provider integration and normalizes tool-calling responses from supported AI providers. The frontend never talks directly to model providers.
search_apisearch_api is a structured OpenAPI index browser, not a free-text keyword search tool.
Current modes:
operationId,The implementation is backed by the generated OpenAPI index used by the frontend tool code. The expected workflow is:
search_api(service="..."),search_api(operationId="..."),call_api(...).call_apicall_api executes a Bytebase API operation by operationId with an optional JSON body. It is the direct bridge from the agent to Bytebase APIs already available to the current signed-in user.
navigatenavigate uses Vue Router to either:
list=true.The prompt instructs the model to list routes first when unsure instead of guessing paths.
get_page_stateget_page_state is the read tool for current-page context.
Current modes:
semantic mode returns route information plus structured context when available,mode: "dom" returns the same base page state plus DOM snapshot information, including snapshot-local element refs such as e1 and e2.There is no standalone get_dom_tree tool. DOM inspection is part of get_page_state(mode="dom").
In semantic mode the implementation currently extracts a narrow context set:
userprojectdatabaseissueThose values are populated from route-aware store lookups in frontend/src/plugins/agent/logic/context.ts.
dom_actiondom_action is the browser-side UI interaction tool. It supports the implemented action types used by the agent runtime, including reading DOM elements from the current snapshot and performing interactions such as click, input, select, and scroll.
The intended workflow is:
get_page_state(mode="dom"),[e1],dom_action(ref="e1", action="click") or another action using that ref.These refs are derived from the current DOM snapshot. They are not durable IDs and should be treated as valid only for the snapshot that returned them.
get_skillget_skill loads reusable workflow guidance shipped with the page agent.
Current skills are:
querydatabase-changegrant-permissionThis keeps step-by-step workflow guidance out of the main prompt until it is needed.
Prompt construction lives in frontend/src/plugins/agent/logic/prompt.ts.
The current system prompt includes:
Important current behavior:
get_page_state first,get_skill before common multi-step workflows,The implemented guidance prefers DOM interaction on pages with unsaved or in-progress UI state, and prefers API access for persisted data, cross-resource lookup, or bulk operations.
proto/v1/v1/ai_service.proto is the source of truth.
At a high level:
AIService.Chat accepts conversation messages and tool_definitions.AIChatMessageRole enum rather than raw role strings.tool_calls.tool_call_id.This contract is what the frontend agent loop serializes to in agentLoop.ts.
Primary implementation files:
frontend/src/layouts/BodyLayout.vuefrontend/src/views/DashboardHeader.vuefrontend/src/plugins/agent/index.tsfrontend/src/plugins/agent/store/agent.tsfrontend/src/plugins/agent/logic/agentLoop.tsfrontend/src/plugins/agent/logic/prompt.tsfrontend/src/plugins/agent/logic/context.tsfrontend/src/plugins/agent/logic/tools/index.tsfrontend/src/plugins/agent/logic/tools/searchApi.tsfrontend/src/plugins/agent/logic/tools/pageState.tsproto/v1/v1/ai_service.protoThis document intentionally excludes older phase plans, handoff notes, and exploratory design alternatives that were useful during implementation but are no longer the durable source of truth.