content/docs/03-ai-sdk-core/36-transcription.mdx
<Note type="warning">Transcription is an experimental feature.</Note>
The AI SDK provides the transcribe
function to transcribe audio using a transcription model.
import { experimental_transcribe as transcribe } from 'ai';
import { openai } from '@ai-sdk/openai';
import { readFile } from 'fs/promises';
const transcript = await transcribe({
model: openai.transcription('whisper-1'),
audio: await readFile('audio.mp3'),
});
The audio property can be a Uint8Array, ArrayBuffer, Buffer, string (base64 encoded audio data), or a URL.
To access the generated transcript:
const text = transcript.text; // transcript text e.g. "Hello, world!"
const segments = transcript.segments; // array of segments with start and end times, if available
const language = transcript.language; // language of the transcript e.g. "en", if available
const durationInSeconds = transcript.durationInSeconds; // duration of the transcript in seconds, if available
Transcription models often have provider or model-specific settings which you can set using the providerOptions parameter.
import { experimental_transcribe as transcribe } from 'ai';
import { openai } from '@ai-sdk/openai';
import { readFile } from 'fs/promises';
const transcript = await transcribe({
model: openai.transcription('whisper-1'),
audio: await readFile('audio.mp3'),
providerOptions: {
openai: {
timestampGranularities: ['word'],
},
},
});
When audio is a URL, the SDK downloads the file with a default 2 GiB size limit.
You can customize this using createDownload:
import { experimental_transcribe as transcribe, createDownload } from 'ai';
import { openai } from '@ai-sdk/openai';
const transcript = await transcribe({
model: openai.transcription('whisper-1'),
audio: new URL('https://example.com/audio.mp3'),
download: createDownload({ maxBytes: 50 * 1024 * 1024 }), // 50 MB limit
});
You can also provide a fully custom download function:
import { experimental_transcribe as transcribe } from 'ai';
import { openai } from '@ai-sdk/openai';
const transcript = await transcribe({
model: openai.transcription('whisper-1'),
audio: new URL('https://example.com/audio.mp3'),
download: async ({ url }) => {
const res = await myAuthenticatedFetch(url);
return {
data: new Uint8Array(await res.arrayBuffer()),
mediaType: res.headers.get('content-type') ?? undefined,
};
},
});
If a download exceeds the size limit, a DownloadError is thrown:
import { experimental_transcribe as transcribe, DownloadError } from 'ai';
import { openai } from '@ai-sdk/openai';
try {
await transcribe({
model: openai.transcription('whisper-1'),
audio: new URL('https://example.com/audio.mp3'),
});
} catch (error) {
if (DownloadError.isInstance(error)) {
console.log('Download failed:', error.message);
}
}
transcribe accepts an optional abortSignal parameter of
type AbortSignal
that you can use to abort the transcription process or set a timeout.
This is particularly useful when combined with URL downloads to prevent long-running requests:
import { openai } from '@ai-sdk/openai';
import { experimental_transcribe as transcribe } from 'ai';
const transcript = await transcribe({
model: openai.transcription('whisper-1'),
audio: new URL('https://example.com/audio.mp3'),
abortSignal: AbortSignal.timeout(5000), // Abort after 5 seconds
});
transcribe accepts an optional headers parameter of type Record<string, string>
that you can use to add custom headers to the transcription request.
import { openai } from '@ai-sdk/openai';
import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'fs/promises';
const transcript = await transcribe({
model: openai.transcription('whisper-1'),
audio: await readFile('audio.mp3'),
headers: { 'X-Custom-Header': 'custom-value' },
});
Warnings (e.g. unsupported parameters) are available on the warnings property.
import { openai } from '@ai-sdk/openai';
import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'fs/promises';
const transcript = await transcribe({
model: openai.transcription('whisper-1'),
audio: await readFile('audio.mp3'),
});
const warnings = transcript.warnings;
When transcribe cannot generate a valid transcript, it throws a AI_NoTranscriptGeneratedError.
This error can arise for any of the following reasons:
The error preserves the following information to help you log the issue:
responses: Metadata about the transcription model responses, including timestamp, model, and headers.cause: The cause of the error. You can use this for more detailed error handling.import {
experimental_transcribe as transcribe,
NoTranscriptGeneratedError,
} from 'ai';
import { openai } from '@ai-sdk/openai';
import { readFile } from 'fs/promises';
try {
await transcribe({
model: openai.transcription('whisper-1'),
audio: await readFile('audio.mp3'),
});
} catch (error) {
if (NoTranscriptGeneratedError.isInstance(error)) {
console.log('NoTranscriptGeneratedError');
console.log('Cause:', error.cause);
console.log('Responses:', error.responses);
}
}
| Provider | Model |
|---|---|
| OpenAI | whisper-1 |
| OpenAI | gpt-4o-transcribe |
| OpenAI | gpt-4o-mini-transcribe |
| ElevenLabs | scribe_v1 |
| ElevenLabs | scribe_v1_experimental |
| Groq | whisper-large-v3-turbo |
| Groq | whisper-large-v3 |
| Azure OpenAI | whisper-1 |
| Azure OpenAI | gpt-4o-transcribe |
| Azure OpenAI | gpt-4o-mini-transcribe |
| Rev.ai | machine |
| Rev.ai | low_cost |
| Rev.ai | fusion |
| Deepgram | base (+ variants) |
| Deepgram | enhanced (+ variants) |
| Deepgram | nova (+ variants) |
| Deepgram | nova-2 (+ variants) |
| Deepgram | nova-3 (+ variants) |
| Gladia | default |
| AssemblyAI | best |
| AssemblyAI | nano |
| Fal | whisper |
| Fal | wizper |
Above are a small subset of the transcription models supported by the AI SDK providers. For more, see the respective provider documentation.