docs/nodes/talk.md
Talk mode is a continuous voice conversation loop:
talk.speak)The assistant may prefix its reply with a single JSON line to control voice:
{ "voice": "<voice-id>", "once": true }
Rules:
once: true applies to the current reply only.once, the voice becomes the new default for Talk mode.Supported keys:
voice / voice_id / voiceIdmodel / model_id / modelIdspeed, rate (WPM), stability, similarity, style, speakerBoostseed, normalize, lang, output_format, latency_tieronce~/.openclaw/openclaw.json){
talk: {
provider: "elevenlabs",
providers: {
elevenlabs: {
voiceId: "elevenlabs_voice_id",
modelId: "eleven_v3",
outputFormat: "mp3_44100_128",
apiKey: "elevenlabs_api_key",
},
mlx: {
modelId: "mlx-community/Soprano-80M-bf16",
},
system: {},
},
speechLocale: "ru-RU",
silenceTimeoutMs: 1500,
interruptOnSpeech: true,
},
}
Defaults:
interruptOnSpeech: truesilenceTimeoutMs: when unset, Talk keeps the platform default pause window before sending the transcript (700 ms on macOS and Android, 900 ms on iOS)provider: selects the active Talk provider. Use elevenlabs, mlx, or system for the macOS-local playback paths.providers.<provider>.voiceId: falls back to ELEVENLABS_VOICE_ID / SAG_VOICE_ID for ElevenLabs (or first ElevenLabs voice when API key is available).providers.elevenlabs.modelId: defaults to eleven_v3 when unset.providers.mlx.modelId: defaults to mlx-community/Soprano-80M-bf16 when unset.providers.elevenlabs.apiKey: falls back to ELEVENLABS_API_KEY (or gateway shell profile if available).speechLocale: optional BCP 47 locale id for on-device Talk speech recognition on iOS/macOS. Leave unset to use the device default.outputFormat: defaults to pcm_44100 on macOS/iOS and pcm_24000 on Android (set mp3_* to force MP3 streaming)chat.send against session key main.talk.speak using the active Talk provider. Android falls back to local system TTS only when that RPC is unavailable.openclaw-mlx-tts helper when present, or an executable on PATH. Set OPENCLAW_MLX_TTS_BIN to point at a custom helper binary during development.stability for eleven_v3 is validated to 0.0, 0.5, or 1.0; other models accept 0..1.latency_tier is validated to 0..4 when set.pcm_16000, pcm_22050, pcm_24000, and pcm_44100 output formats for low-latency AudioTrack streaming.