content/docs/03-ai-sdk-core/37-speech.mdx
<Note type="warning">Speech is an experimental feature.</Note>
The AI SDK provides the generateSpeech
function to generate speech from text using a speech model.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const audio = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
});
You can specify the language for speech generation (provider support varies):
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { lmnt } from '@ai-sdk/lmnt';
const audio = await generateSpeech({
model: lmnt.speech('aurora'),
text: 'Hola, mundo!',
language: 'es', // Spanish
});
To access the generated audio:
const audioData = result.audio.uint8Array; // audio data as Uint8Array
// or
const audioBase64 = result.audio.base64; // audio data as base64 string
You can set model-specific settings with the providerOptions parameter.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const audio = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
providerOptions: {
openai: {
// ...
},
},
});
generateSpeech accepts an optional abortSignal parameter of
type AbortSignal
that you can use to abort the speech generation process or set a timeout.
import { openai } from '@ai-sdk/openai';
import { experimental_generateSpeech as generateSpeech } from 'ai';
const audio = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
abortSignal: AbortSignal.timeout(1000), // Abort after 1 second
});
generateSpeech accepts an optional headers parameter of type Record<string, string>
that you can use to add custom headers to the speech generation request.
import { openai } from '@ai-sdk/openai';
import { experimental_generateSpeech as generateSpeech } from 'ai';
const audio = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
headers: { 'X-Custom-Header': 'custom-value' },
});
Warnings (e.g. unsupported parameters) are available on the warnings property.
import { openai } from '@ai-sdk/openai';
import { experimental_generateSpeech as generateSpeech } from 'ai';
const audio = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
});
const warnings = audio.warnings;
When generateSpeech cannot generate a valid audio, it throws a AI_NoSpeechGeneratedError.
This error can arise for any of the following reasons:
The error preserves the following information to help you log the issue:
responses: Metadata about the speech model responses, including timestamp, model, and headers.cause: The cause of the error. You can use this for more detailed error handling.import {
experimental_generateSpeech as generateSpeech,
NoSpeechGeneratedError,
} from 'ai';
import { openai } from '@ai-sdk/openai';
try {
await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
});
} catch (error) {
if (NoSpeechGeneratedError.isInstance(error)) {
console.log('AI_NoSpeechGeneratedError');
console.log('Cause:', error.cause);
console.log('Responses:', error.responses);
}
}
| Provider | Model |
|---|---|
| OpenAI | tts-1 |
| OpenAI | tts-1-hd |
| OpenAI | gpt-4o-mini-tts |
| ElevenLabs | eleven_v3 |
| ElevenLabs | eleven_multilingual_v2 |
| ElevenLabs | eleven_flash_v2_5 |
| ElevenLabs | eleven_flash_v2 |
| ElevenLabs | eleven_turbo_v2_5 |
| ElevenLabs | eleven_turbo_v2 |
| LMNT | aurora |
| LMNT | blizzard |
| Hume | default |
Above are a small subset of the speech models supported by the AI SDK providers. For more, see the respective provider documentation.