docs/usage/agent/tts-stt.mdx
LobeHub supports voice capabilities — listen to Agent responses hands-free, speak your messages instead of typing, and hold natural back-and-forth voice conversations. TTS converts text to speech; STT converts your voice to text.
Voice features in LobeHub provide:
TTS converts AI text responses into spoken audio, allowing you to listen instead of read.
To have AI read text aloud, simply highlight any content in the chat window and select Text-to-Speech. The AI will use a TTS model to convert the selected text into speech.
<Image alt={'TTS'} src={'/blog/assets907ea775d228958baca38e2dbb65939a.webp'} />
You can also configure an Agent to automatically read all responses as soon as each message completes — useful for hands-free workflows.
LobeHub supports two voice providers:
<Tabs> <Tab title="OpenAI Voices"> Premium neural voices with natural prosody and intonation:| Voice | Character |
| ------- | ------------------- |
| Alloy | Neutral, balanced |
| Echo | Clear, professional |
| Fable | Warm, friendly |
| Onyx | Deep, authoritative |
| Nova | Energetic, engaging |
| Shimmer | Soft, gentle |
**Best for**: Long-form content listening, professional use cases, content requiring natural flow.
**Best for**: Specific accent requirements, multi-language content, variety.
When audio is playing:
TTS audio is automatically cached — the first playback generates audio in real time, and subsequent playbacks are instant from cache.
You can customize the voice conversion experience by selecting your preferred models in the settings.
<Image alt={'TTS Settings'} src={'/blog/assets89168f61edcb2ee92d2ad7064da218b2.webp'} />
Settings panelText-to-Speech sectionEach Agent can have its own voice. To configure per-Agent: open Agent settings → TTS section → select voice provider → choose a voice → test with sample text → save.
STT converts your spoken words into text, enabling voice input for messages.
To input text using your voice, click the voice input option in the message box. LobeHub will convert your speech into text and insert it into the input field. Once you're done, you can send it directly to the AI.
<Image alt={'STT'} src={'/blog/assets34424062ad6ab98df7f56c9e61341be5.webp'} />
STT supports a wide range of languages including English (US, UK, AU, CA, IN), Spanish, French, German, Italian, Portuguese, Chinese (Mandarin), Japanese, Korean, and many more. Language is typically auto-detected or set based on your interface language.
Combine TTS and STT for natural, continuous voice conversations:
<Callout type={'warning'}> Voice input is processed by AI services for transcription. Avoid speaking sensitive information unless you are using a private or local deployment. </Callout>
Voice data handling:
Best practices: review transcriptions before sending, don't speak passwords or sensitive data, and clear the local audio cache periodically if you are concerned about storage.
<Cards> <Card href={'/docs/usage/agent/translate'} title={'Conversation Translation'} /><Card href={'/docs/usage/getting-started/agent'} title={'Agent'} /> </Cards>