pkg/audio/tts/README.md
This package handles speech synthesis for PicoClaw.
If you are new to TTS setup, the simplest workflow is:
model_list.voice.tts_model_name at that entry..security.yml.For most users, these are the best starting points:
| Provider | Why start here |
|---|---|
| OpenAI | Best-supported path in PicoClaw today. The current TTS implementation is built around the OpenAI-compatible /audio/speech API shape, and OpenAI is the safest default. |
| Xiaomi MiMo | A good second option if you want an OpenAI-compatible provider endpoint and are already using MiMo models in the rest of your stack. |
PicoClaw does not keep TTS API keys inside voice.
Instead:
voice.tts_model_name selects a named entry from model_list.model_list entry provides the provider, model ID, API base, and proxy settings.model_list[].extra_body
to pass fields such as voice and response_format..security.yml stores the API key for the same named model entry.This is the recommended and supported configuration pattern.
config.json
{
"voice": {
"tts_model_name": "openai-tts"
},
"model_list": [
{
"model_name": "openai-tts",
"model": "openai/tts-1"
}
]
}
.security.yml
model_list:
openai-tts:
api_keys:
- "sk-openai-your-key"
config.json
{
"voice": {
"tts_model_name": "mimo-tts"
},
"model_list": [
{
"model_name": "mimo-tts",
"model": "mimo/mimo-v2-tts"
}
]
}
.security.yml
model_list:
mimo-tts:
api_keys:
- "your-mimo-key"
If you use a custom MiMo endpoint, you can also set api_base explicitly. Otherwise PicoClaw will use the provider default.
Some OpenAI-compatible TTS routes require provider-specific request fields.
OpenRouter's microsoft/mai-voice-2 is one example: it needs a model-specific
voice name and works best with response_format: "mp3".
config.json
{
"voice": {
"tts_model_name": "mai-voice-2"
},
"model_list": [
{
"model_name": "mai-voice-2",
"provider": "openrouter",
"model": "microsoft/mai-voice-2",
"api_base": "https://openrouter.ai/api/v1",
"extra_body": {
"voice": "en-US-Harper:MAI-Voice-2",
"response_format": "mp3"
}
}
]
}
.security.yml
model_list:
mai-voice-2:
api_keys:
- "sk-or-your-openrouter-key"
The current TTS runtime uses an OpenAI-compatible speech request with these defaults:
/audio/speechopusalloymodel_list entryThese defaults can now be overridden per model through model_list[].extra_body.
That means:
openai/tts-1 works naturally.voice and response_format values.response_format, PicoClaw retries once without that field.DetectTTS resolves TTS in this order:
voice.tts_model_name against model_list.voice.tts_model_name is not set or cannot be resolved, PicoClaw scans model_list for the first entry whose model string contains tts and has an API key.Fallback scanning exists for compatibility. New configs should set voice.tts_model_name explicitly.
PicoClaw normalizes the configured base URL for TTS:
https://api.openai.com or https://api.openai.com/v1 becomes https://api.openai.com/v1/audio/speech./audio/speech.api_base is omitted, PicoClaw uses the provider default base when the model prefix is known.voice.tts_model_name to a name that does not exist in model_list..security.yml.model_list[].extra_body.voice or model_list[].extra_body.response_format for TTS models that require them./audio/speech request format.Before testing send_tts, make sure:
voice.tts_model_name matches a model_list[].model_name..security.yml entry contains a valid API key.