Back to Eliza

Talk Mode

packages/docs/apps/dashboard/talk-mode.md

2.0.14.1 KB
Original Source

Talk Mode provides a full voice conversation pipeline for the Eliza desktop app. It combines offline speech-to-text (Whisper.cpp), streaming text-to-speech (ElevenLabs), and voice activity detection into a seamless hands-free experience.

<Info> Talk Mode is a native desktop feature. It requires the Electrobun desktop app — it is not available in the web dashboard or mobile app. </Info>

How It Works

  1. You speak — the microphone captures audio and streams PCM samples to the main process
  2. Speech recognition — Whisper.cpp transcribes your speech to text offline
  3. Agent processes — the transcript is sent to the agent as a message
  4. Agent speaks — the response is converted to speech via ElevenLabs and played back

State Machine

Talk Mode cycles through four states:

StateDescription
idleTalk Mode is off
listeningMicrophone is active, waiting for speech
processingTranscription complete, agent is generating a response
speakingAgent response is being played back as audio

After speaking completes, Talk Mode returns to listening for the next turn.

Configuration

Talk Mode is configured through the TalkModeConfig interface:

Speech-to-Text (STT)

SettingTypeDefaultDescription
enginestring"whisper""whisper" for offline Whisper.cpp, "web" for browser Web Speech API
modelSizestring"base"Whisper model size: "tiny", "base", "small", "medium", "large"
languagestringOptional language code for transcription

Larger Whisper models are more accurate but require more memory and processing time. If Whisper is unavailable, Talk Mode falls back to the Web Speech API automatically.

Text-to-Speech (TTS)

SettingTypeDefaultDescription
enginestring"elevenlabs""elevenlabs" for ElevenLabs API, "system" for native OS TTS
apiKeystringElevenLabs API key (configured in Settings > Secrets)
voiceIdstringElevenLabs voice ID
modelIdstring"eleven_v3"ElevenLabs model

Falls back to system TTS if no ElevenLabs API key is configured.

Voice Activity Detection (VAD)

SettingTypeDefaultDescription
enabledbooleantrueEnable/disable voice activity detection
silenceThresholdnumberAudio level below which silence is detected
silenceDurationnumberDuration of silence (ms) before stopping capture

Permissions

Talk Mode requires the microphone permission. In the desktop app, you can grant this from Settings > Permissions.

IPC Events

Talk Mode communicates between the renderer and main process via IPC:

Commands (Renderer → Main)

ChannelDescription
talkmode:startStart Talk Mode
talkmode:stopStop Talk Mode
talkmode:speakTrigger TTS for text
talkmode:stopSpeakingInterrupt current playback
talkmode:isSpeakingQuery speaking state
talkmode:getStateQuery current state
talkmode:isEnabledCheck if Talk Mode is available
talkmode:updateConfigUpdate configuration
talkmode:isWhisperAvailableCheck Whisper.cpp availability
talkmode:getWhisperInfoGet Whisper model info

Events (Main → Renderer)

ChannelDescription
talkmode:transcriptTranscription result with isFinal flag
talkmode:speakingSpeaking state changed
talkmode:speakCompletePlayback finished
talkmode:audioChunkBase64-encoded audio chunk for playback
talkmode:audioCompleteAll audio chunks sent
talkmode:stateChangeState machine transition
talkmode:errorError with diagnostic code
  • Desktop App — desktop-specific features and keyboard shortcuts
  • Native Modules — IPC reference for Talk Mode and other native features
  • Settings — TTS/STT provider configuration