TTS (Text-to-Speech)

This package handles speech synthesis for PicoClaw.

If you are new to TTS setup, the simplest workflow is:

Add a TTS-capable entry to model_list.
Point voice.tts_model_name at that entry.
Put the API key in .security.yml.

Quick Recommendation

For most users, these are the best starting points:

Provider	Why start here
OpenAI	Best-supported path in PicoClaw today. The current TTS implementation is built around the OpenAI-compatible `/audio/speech` API shape, and OpenAI is the safest default.
Xiaomi MiMo	A good second option if you want an OpenAI-compatible provider endpoint and are already using MiMo models in the rest of your stack.

How TTS Configuration Works

PicoClaw does not keep TTS API keys inside voice.

Instead:

voice.tts_model_name selects a named entry from model_list.
That model_list entry provides the provider, model ID, API base, and proxy settings.
.security.yml stores the API key for the same named model entry.

This is the recommended and supported configuration pattern.

Recommended Setup

Option A: OpenAI

config.json

json

{
  "voice": {
    "tts_model_name": "openai-tts"
  },
  "model_list": [
    {
      "model_name": "openai-tts",
      "model": "openai/tts-1"
    }
  ]
}

.security.yml

yaml

model_list:
  openai-tts:
    api_keys:
      - "sk-openai-your-key"

Option B: Xiaomi MiMo

config.json

json

{
  "voice": {
    "tts_model_name": "mimo-tts"
  },
  "model_list": [
    {
      "model_name": "mimo-tts",
      "model": "mimo/mimo-v2-tts"
    }
  ]
}

.security.yml

yaml

model_list:
  mimo-tts:
    api_keys:
      - "your-mimo-key"

If you use a custom MiMo endpoint, you can also set api_base explicitly. Otherwise PicoClaw will use the provider default.

What PicoClaw Sends Today

The current TTS runtime uses an OpenAI-compatible speech request with these defaults:

Endpoint: /audio/speech
Response format: opus
Voice: alloy
Model: taken from the selected model_list entry

That means:

openai/tts-1 works naturally.
Other OpenAI-compatible providers can work if they accept the same request format.
PicoClaw currently does not expose a user-facing config field for changing the TTS voice from alloy.

How PicoClaw Chooses a TTS Provider

DetectTTS resolves TTS in this order:

Preferred path: resolve voice.tts_model_name against model_list.
If a matching model entry exists and has an API key, PicoClaw creates an OpenAI-compatible TTS provider using that model's settings.
Fallback path: if voice.tts_model_name is not set or cannot be resolved, PicoClaw scans model_list for the first entry whose model string contains tts and has an API key.

Fallback scanning exists for compatibility. New configs should set voice.tts_model_name explicitly.

Notes About API Base Handling

PicoClaw normalizes the configured base URL for TTS:

For OpenAI, a base like https://api.openai.com or https://api.openai.com/v1 becomes https://api.openai.com/v1/audio/speech.
For other OpenAI-compatible providers, PicoClaw preserves the configured base path and ensures it ends with /audio/speech.
If api_base is omitted, PicoClaw uses the provider default base when the model prefix is known.

Common Mistakes

Setting voice.tts_model_name to a name that does not exist in model_list.
Adding a TTS model but forgetting to put its API key in .security.yml.
Assuming PicoClaw will automatically use provider-specific custom voices.
Using a provider endpoint that is not compatible with the OpenAI /audio/speech request format.

Minimal Checklist

Before testing send_tts, make sure:

voice.tts_model_name matches a model_list[].model_name.
The matching .security.yml entry contains a valid API key.
The chosen provider supports an OpenAI-compatible speech synthesis endpoint.
Your selected model is actually a TTS-capable model.