chrome/browser/dictation/README.md
(Googlers only for now): https://docs.google.com/document/d/1Zri5oR3P3D5LOq9Gjs4AeHFrMMOX7bTtctxT6KrGeSQ/edit?usp=sharing
The top-level lifecycle of a Dictation interaction. A session begins when the user invokes the feature (e.g., via a context menu) and ends when the UI is dismissed or the task is completed. A session coordinates one or more speech recognition streams.
A single period of audio capture and speech recognition.
The current stream in a session that is primary to the user's interaction. The "Attached" status determines which stream's state (e.g., volume levels, processing status) is reflected in the session's UI.
A state where a stream has stopped capturing audio (the microphone is off) but is still awaiting final data from the backend. This might include:
Multiple streams can be in this state simultaneously as they flush their remaining backend data.
An abstraction representing an editable field (e.g., an HTML <input>,
<textarea>, EditContext, or contenteditable element).
The final operation where the stable, transformed transcription is inserted into the target. A commit only occurs once the text is finalized (i.e., after all transcriptions and transformations are complete). For MVP, this is an atomic operation at the end of a stream.
Speech recognition is provided by a cloud hosted service. The implementation in Chrome connects to this service via a component extension we call the "connector" or "connector extension". The core dictation logic in the browser process communicates with the connector extension via a private extension API.
The extension itself is built outside of the chromium source tree and provided via the component updater.
Core dictation logic
C++ implementation of the private extension API used to communicate between core dictation impl in the browser and the connector extension.
The IDL defining the API exposed to the connector extension
(See https://crrev.com/c/7871142 for all the files touched relevant to the extension API)
Views UI components
dictation_menu_observer.h - Implements the start dictation context menu item
render_view_context_menu.cc - Where the context menu item is registered in the menu