chrome/browser/resources/ai_overlay_dialog/README.md
This directory contains the WebUI implementation of the AI Overlay Dialog.
┌──────────────────┐ ┌─────────────┐
│ AudioCapturer │ │ AudioPlayer │
└─────────▲────────┘ └───────▲─────┘
│ │
│ │
│ │
│ │
│ │
┌───────┴──────────────────┴────┐
│ │
│ AIOverlayDialogElement │
UI AND IO │ │ └───────────────┬───────────────┘ │ ──────────────────────────────────┼────────────────────────────────────────── │ │ LOGIC │ ┌───────────────▼───────────────┐ ┌──────────────────┐ │ │ │ │ │ Conversation ┼─────► ApiSession │ │ │ │ │ └───────────────────────────────┘ └──────────────────┘
(diagram can be modified using asciiflow.com)
AiOverlayDialogElement (ai_overlay_dialog.ts): The main custom element
(<ai-overlay-dialog>) and entry point. It instantiates a Conversation and
provides the audio I/O components to it.Conversation (conversation.ts): The central coordinator class that
bridges the UI/IO with the API session and implements the core logic and
state tracking. This is a long lived object that persists across page loads
and tabs.ApiSession (api_session.ts): Manages the WebSocket connection to the
AI backend and handles communication. This object is recreated whenever page
context is updated. In practice, this means whenever the active tab changes
or is navigated.Persona (persona.ts): Defines the system instruction for the AI.Conversation is the source of truth for the conversation state which can be in one of three states:
The UI in app.ts receives changes in the Conversation state but uses it's own, similar but distinct, state machine to update the UI.
The personality and behavior of the model are defined by "System Instructions." These instructions use a bespoke templating syntax that allows you to inject dynamic information about the user's current web page and browser state.
Processing works by applying each step to the whole document. The order is: conditionals, then numbering, then variable substitution.
You can insert live information into your instructions using ${variable_name}.
Example:
The current page is "${title}" at ${url}.
You can include or exclude text based on whether a variable evaluates to a truthy value.
?${variable}[Text if true]{else}[Text if false]{}??${variable}[Text if true]{}? (Shows nothing if false)Example:
?${isLoaded}[Currently on ${title}.]{else}[The page is still loading.]{}?
Nesting conditionals is not supported.
When writing a list of instructions or tools, you can use #{1} to have the
numbers generated automatically. Each time you use #{1} in your template, it
will be replaced by the next number in sequence (1, 2, 3, etc.).
If you need multiple independent lists, you can use a different group number
(like #{2}).
Example:
#{1}. Foo. #{1}. Bar. #{2}. Reticulating #{2}. Splines #{1}. Baz.
Renders as:
- Foo.
- Bar.
- Reticulating
- Splines
- Use the page content.
Provide an API key by starting chrome with chrome:
--enable-features=AiOverlayDialog:api_key/<YOUR API KEY>
Provide a .wav file to play (via button in UI) if you have no microphone ('+' is
replaced with '/' to allow specifying paths since slashes are used in
--enable-features syntax):
--enable-features=AiOverlayDialog:mock_audio_path/+home+bokan+sounds+hello.wav
You can provide both; feature params are also separated with '/'. E.g.
--enable-features=AiOverlayDialog:api_key/123/mock_audio_path/+hello.wav
Provide a path to the bundle of resources using:
--ttc-bundle-url=https://example.com/foo
Resources will be fetched from the given path at the specified URL.