docs/providers/azure-speech.md
Azure Speech is an Azure AI Speech text-to-speech provider. In OpenClaw it synthesizes outbound reply audio as MP3 by default, native Ogg/Opus for voice notes, and 8 kHz mulaw audio for telephony channels such as Voice Call.
OpenClaw uses the Azure Speech REST API directly with SSML and sends the
provider-owned output format through X-Microsoft-OutputFormat.
| Detail | Value |
|---|---|
| Website | Azure AI Speech |
| Docs | Speech REST text-to-speech |
| Auth | AZURE_SPEECH_KEY plus AZURE_SPEECH_REGION |
| Default voice | en-US-JennyNeural |
| Default file output | audio-24khz-48kbitrate-mono-mp3 |
| Default voice-note file | ogg-24khz-16bit-mono-opus |
```
AZURE_SPEECH_KEY=<speech-resource-key>
AZURE_SPEECH_REGION=eastus
```
| Option | Path | Description |
|---|---|---|
apiKey | messages.tts.providers.azure-speech.apiKey | Azure Speech resource key. Falls back to AZURE_SPEECH_KEY, AZURE_SPEECH_API_KEY, or SPEECH_KEY. |
region | messages.tts.providers.azure-speech.region | Azure Speech resource region. Falls back to AZURE_SPEECH_REGION or SPEECH_REGION. |
endpoint | messages.tts.providers.azure-speech.endpoint | Optional Azure Speech endpoint/base URL override. |
baseUrl | messages.tts.providers.azure-speech.baseUrl | Optional Azure Speech base URL override. |
voice | messages.tts.providers.azure-speech.voice | Azure voice ShortName (default en-US-JennyNeural). |
lang | messages.tts.providers.azure-speech.lang | SSML language code (default en-US). |
outputFormat | messages.tts.providers.azure-speech.outputFormat | Audio-file output format (default audio-24khz-48kbitrate-mono-mp3). |
voiceNoteOutputFormat | messages.tts.providers.azure-speech.voiceNoteOutputFormat | Voice-note output format (default ogg-24khz-16bit-mono-opus). |