chrome/browser/dictation/README.md
(Googlers only for now): https://docs.google.com/document/d/1Zri5oR3P3D5LOq9Gjs4AeHFrMMOX7bTtctxT6KrGeSQ/edit?usp=sharing
The top-level lifecycle of a Dictation interaction. A session begins when the user invokes the feature (e.g., via a context menu) and ends when the UI is dismissed or the task is completed. A session coordinates one or more transcription streams.
A single period of audio capture and transcription.
The current stream in a session that is primary to the user's interaction. The "Attached" status determines which stream's state (e.g., volume levels, processing status) is reflected in the session's UI.
A state where a stream has stopped capturing audio (the microphone is off) but is still awaiting final data from the backend. This might include:
Multiple streams can be in this state simultaneously as they flush their remaining backend data.
An abstraction representing an editable field (e.g., an HTML <input>,
<textarea>, EditContext, or contenteditable element).
The final operation where the stable, transformed transcription is inserted into the target. A commit only occurs once the text is finalized (i.e., after all transcriptions and transformations are complete). For MVP, this is an atomic operation at the end of a stream.