website/docs/guides/use-voice-mode-with-hermes.md
This guide is the practical companion to the Voice Mode feature reference.
If the feature page explains what voice mode can do, this guide shows how to actually use it well.
:::tip Nous Portal bundles both the LLM and TTS through one OAuth — voice mode works end-to-end with no extra credentials. :::
Voice mode is especially useful when:
There are really three different voice experiences in Hermes.
| Mode | Best for | Platform |
|---|---|---|
| Interactive microphone loop | Personal hands-free use while coding or researching | CLI |
| Voice replies in chat | Spoken responses alongside normal messaging | Telegram, Discord |
| Live voice channel bot | Group or personal live conversation in a VC | Discord voice channels |
A good path is:
Before touching voice mode, verify that:
hermes
Ask something simple:
What tools do you have available?
If that is not solid yet, fix text mode first.
pip install "hermes-agent[voice]"
pip install "hermes-agent[messaging]"
pip install "hermes-agent[tts-premium]"
python -m pip install -U neutts[all]
pip install "hermes-agent[all]"
brew install portaudio ffmpeg opus
brew install espeak-ng
sudo apt install portaudio19-dev ffmpeg libopus0
sudo apt install espeak-ng
Why these matter:
portaudio → microphone input / playback for CLI voice modeffmpeg → audio conversion for TTS and messaging deliveryopus → Discord voice codec supportespeak-ng → phonemizer backend for NeuTTSHermes supports both local and cloud speech stacks.
Use local STT and free Edge TTS:
localedgeThis is usually the best place to start.
Add to ~/.hermes/.env:
# Cloud STT options (local needs no key)
GROQ_API_KEY=***
VOICE_TOOLS_OPENAI_KEY=***
# Premium TTS (optional)
ELEVENLABS_API_KEY=***
local → best default for privacy and zero-cost usegroq → very fast cloud transcriptionopenai → good paid fallbackedge → free and good enough for most usersneutts → free local/on-device TTSelevenlabs → best qualityopenai → good middle groundmistral → multilingual, native Opushermes setupIf you choose NeuTTS in the setup wizard, Hermes checks whether neutts is already installed. If it is missing, the wizard tells you NeuTTS needs the Python package neutts and the system package espeak-ng, offers to install them for you, installs espeak-ng with your platform package manager, and then runs:
python -m pip install -U neutts[all]
If you skip that install or it fails, the wizard falls back to Edge TTS.
voice:
record_key: "ctrl+b"
max_recording_seconds: 120
auto_tts: false
beep_enabled: true
silence_threshold: 200
silence_duration: 3.0
stt:
provider: "local"
local:
model: "base"
tts:
provider: "edge"
edge:
voice: "en-US-AriaNeural"
This is a good conservative default for most people.
If you want local TTS instead, switch the tts block to:
tts:
provider: "neutts"
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
Start Hermes:
hermes
Inside the CLI:
/voice on
Default key:
Ctrl+BWorkflow:
Ctrl+B/voice
/voice on
/voice off
/voice tts
/voice status
Say:
I keep getting a docker permission error. Help me debug it.
Then continue hands-free:
Great for:
If typing is inconvenient, voice mode is one of the fastest ways to stay in the full Hermes loop.
If Hermes starts/stops too aggressively, tune:
voice:
silence_threshold: 250
Higher threshold = less sensitive.
If you pause a lot between sentences, increase:
voice:
silence_duration: 4.0
If Ctrl+B conflicts with your terminal or tmux habits:
voice:
record_key: "ctrl+space"
This mode is simpler than full voice channels.
Hermes stays a normal chat bot, but can speak replies.
hermes gateway
Inside Telegram or Discord:
/voice on
or
/voice tts
| Mode | Meaning |
|---|---|
off | text only |
voice_only | speak only when the user sent voice |
all | speak every reply |
/voice on if you want spoken replies only for voice-originating messages/voice tts if you want a full spoken assistant all the timeUse when:
Useful when you want private interaction without server-channel mention behavior.
This is the most advanced mode.
Hermes joins a Discord VC, listens to user speech, transcribes it, runs the normal agent pipeline, and speaks replies back into the channel.
In addition to the normal text-bot setup, make sure the bot has:
Also enable privileged intents in the Developer Portal:
In a Discord text channel where the bot is present:
/voice join
/voice leave
/voice status
/voice join was issuedDISCORD_ALLOWED_USERS tightlarge-v3 or Groq whisper-large-v3base or GroqInstall portaudio.
Check:
DISCORD_ALLOWED_USERSCheck:
ffmpeg install for Edge conversion pathsTry:
silence_thresholdThat is often mention policy.
By default, the bot needs an @mention in Discord server text channels unless configured otherwise.
If you want the shortest path to success:
hermes-agent[voice]/voice on in Telegram or DiscordThat progression keeps the debugging surface small.