pkg/audio/asr/README.md
This package handles speech-to-text for PicoClaw voice input.
If you are new to ASR setup, the simplest mental model is:
model_list.voice.model_name at the one you want to use..security.yml.For most new users, start with one of these:
| Provider | Example model | Why start here |
|---|---|---|
| Groq | groq/whisper-large-v3-turbo | Fast Whisper-style transcription and a straightforward OpenAI-compatible API. Groq currently advertises a free tier plan for 2000 reqs/day. |
| ElevenLabs | elevenlabs/scribe_v1 | Easy setup and strong speech-to-text quality. ElevenLabs currently advertises a free plan that includes speech-to-text usage. |
Pricing and free-plan limits can change, so check the linked pricing pages before depending on them in production.
PicoClaw does not keep ASR API keys inside the voice section.
Instead:
voice.model_name chooses a named entry from model_list.model_list entry describes the actual provider and model..security.yml stores the API key for that named model entry.This is the recommended pattern because it is explicit, reusable, and consistent with the rest of PicoClaw's model configuration.
config.json
{
"voice": {
"model_name": "groq-asr",
"echo_transcription": true
},
"model_list": [
{
"model_name": "groq-asr",
"model": "groq/whisper-large-v3-turbo"
}
]
}
.security.yml
model_list:
groq-asr:
api_keys:
- "gsk_your_groq_key"
Notes:
api_base and PicoClaw will use Groq's default API base automatically.api_base manually for Groq Whisper, both of these forms work:
https://api.groq.com/openai/v1https://api.groq.com/openai/v1/audio/transcriptionswhisper can use the Whisper transcription path, not only whisper-large-v3-turbo.config.json
{
"voice": {
"model_name": "elevenlabs-asr",
"echo_transcription": true
},
"model_list": [
{
"model_name": "elevenlabs-asr",
"model": "elevenlabs/scribe_v1"
}
]
}
.security.yml
model_list:
elevenlabs-asr:
api_keys:
- "sk-elevenlabs-your-key"
config.json
{
"voice": {
"model_name": "openai-asr"
},
"model_list": [
{
"model_name": "openai-asr",
"model": "openai/whisper-1"
}
]
}
.security.yml
model_list:
openai-asr:
api_keys:
- "sk-openai-your-key"
PicoClaw currently supports three main ASR routes:
| Route | Example models | Behavior |
|---|---|---|
| ElevenLabs ASR | elevenlabs/scribe_v1 | Uses the ElevenLabs transcription API. |
| Whisper endpoint models | openai/whisper-1, groq/whisper-large-v3 | Uses an OpenAI-compatible /audio/transcriptions endpoint. |
| Audio-capable chat models (Under construction) | openai/gpt-4o-audio-preview, gemini/gemini-2.5-flash | Sends audio to a multimodal chat model and asks it to transcribe. |
If you are unsure which one to pick, choose Groq Whisper or ElevenLabs first.
DetectTranscriber resolves ASR in this order:
voice.model_name against model_list.elevenlabs/..., PicoClaw uses the ElevenLabs transcriber.AudioModelTranscriber.voice.model_name is not set, PicoClaw performs a compatibility scan through model_list for legacy auto-detected ASR entries.Fallback scanning exists for backward compatibility. New configurations should set voice.model_name explicitly.
model_list but forgetting to set voice.model_name.voice instead of .security.yml.api_base that points to the wrong provider endpoint.Before testing voice input, make sure:
voice.model_name matches a model_list[].model_name..security.yml entry contains a valid API key.