Supporting TTS & STT Voice Conversations

LobeHub now supports Text-to-Speech (TTS) and Speech-to-Text (STT), turning typed conversations into natural voice interactions. You can speak with your Agents and hear their responses, making the experience closer to talking with a real person.

Natural voice interaction

With TTS, your Agents can read responses aloud in clear, natural-sounding voices. With STT, you can dictate messages instead of typing. Together, they enable hands-free interaction—useful when you're multitasking, on the move, or simply prefer speaking to typing.

This is especially helpful for:

Auditory learners who process information better by hearing
Users who want to stay productive while commuting or away from a keyboard
Anyone who finds voice more accessible or convenient than text

Personalized voice selection

Different Agents can have different voices. Choose a voice that matches each Agent's personality or purpose. A professional assistant might use a calm, measured tone. A creative collaborator might sound more expressive.

We've curated high-quality voices from OpenAI Audio and Microsoft Edge Speech to serve users across regions and preferences. Select the voice that fits your usage style or scenario.

A complete communication loop

Voice support closes the gap between human and AI interaction styles. Speak naturally, hear responses aloud, and maintain context just like you would in a spoken conversation. The rest of LobeHub's features—plugins, multimodal support, context management—work seamlessly alongside voice mode.

Voice Conversations: Talk Naturally With Your Agents

Supporting TTS & STT Voice Conversations

Natural voice interaction

Personalized voice selection

A complete communication loop