Back to Ai

Transcription

content/docs/03-ai-sdk-core/36-transcription.mdx

2.1.108.4 KB
Original Source

Transcription

<Note type="warning">Transcription is an experimental feature.</Note>

The AI SDK provides the transcribe function to transcribe audio using a transcription model.

ts
import { experimental_transcribe as transcribe } from 'ai';
import { openai } from '@ai-sdk/openai';
import { readFile } from 'fs/promises';

const transcript = await transcribe({
  model: openai.transcription('whisper-1'),
  audio: await readFile('audio.mp3'),
});

The audio property can be a Uint8Array, ArrayBuffer, Buffer, string (base64 encoded audio data), or a URL.

To access the generated transcript:

ts
const text = transcript.text; // transcript text e.g. "Hello, world!"
const segments = transcript.segments; // array of segments with start and end times, if available
const language = transcript.language; // language of the transcript e.g. "en", if available
const durationInSeconds = transcript.durationInSeconds; // duration of the transcript in seconds, if available

Settings

Provider-Specific settings

Transcription models often have provider or model-specific settings which you can set using the providerOptions parameter.

ts
import { experimental_transcribe as transcribe } from 'ai';
import { openai } from '@ai-sdk/openai';
import { readFile } from 'fs/promises';

const transcript = await transcribe({
  model: openai.transcription('whisper-1'),
  audio: await readFile('audio.mp3'),
  providerOptions: {
    openai: {
      timestampGranularities: ['word'],
    },
  },
});

Download Size Limits

When audio is a URL, the SDK downloads the file with a default 2 GiB size limit. You can customize this using createDownload:

ts
import { experimental_transcribe as transcribe, createDownload } from 'ai';
import { openai } from '@ai-sdk/openai';

const transcript = await transcribe({
  model: openai.transcription('whisper-1'),
  audio: new URL('https://example.com/audio.mp3'),
  download: createDownload({ maxBytes: 50 * 1024 * 1024 }), // 50 MB limit
});

You can also provide a fully custom download function:

ts
import { experimental_transcribe as transcribe } from 'ai';
import { openai } from '@ai-sdk/openai';

const transcript = await transcribe({
  model: openai.transcription('whisper-1'),
  audio: new URL('https://example.com/audio.mp3'),
  download: async ({ url }) => {
    const res = await myAuthenticatedFetch(url);
    return {
      data: new Uint8Array(await res.arrayBuffer()),
      mediaType: res.headers.get('content-type') ?? undefined,
    };
  },
});

If a download exceeds the size limit, a DownloadError is thrown:

ts
import { experimental_transcribe as transcribe, DownloadError } from 'ai';
import { openai } from '@ai-sdk/openai';

try {
  await transcribe({
    model: openai.transcription('whisper-1'),
    audio: new URL('https://example.com/audio.mp3'),
  });
} catch (error) {
  if (DownloadError.isInstance(error)) {
    console.log('Download failed:', error.message);
  }
}

Abort Signals and Timeouts

transcribe accepts an optional abortSignal parameter of type AbortSignal that you can use to abort the transcription process or set a timeout.

This is particularly useful when combined with URL downloads to prevent long-running requests:

ts
import { openai } from '@ai-sdk/openai';
import { experimental_transcribe as transcribe } from 'ai';

const transcript = await transcribe({
  model: openai.transcription('whisper-1'),
  audio: new URL('https://example.com/audio.mp3'),
  abortSignal: AbortSignal.timeout(5000), // Abort after 5 seconds
});

Custom Headers

transcribe accepts an optional headers parameter of type Record<string, string> that you can use to add custom headers to the transcription request.

ts
import { openai } from '@ai-sdk/openai';
import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'fs/promises';

const transcript = await transcribe({
  model: openai.transcription('whisper-1'),
  audio: await readFile('audio.mp3'),
  headers: { 'X-Custom-Header': 'custom-value' },
});

Warnings

Warnings (e.g. unsupported parameters) are available on the warnings property.

ts
import { openai } from '@ai-sdk/openai';
import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'fs/promises';

const transcript = await transcribe({
  model: openai.transcription('whisper-1'),
  audio: await readFile('audio.mp3'),
});

const warnings = transcript.warnings;

Error Handling

When transcribe cannot generate a valid transcript, it throws a AI_NoTranscriptGeneratedError.

This error can arise for any of the following reasons:

  • The model failed to generate a response
  • The model generated a response that could not be parsed

The error preserves the following information to help you log the issue:

  • responses: Metadata about the transcription model responses, including timestamp, model, and headers.
  • cause: The cause of the error. You can use this for more detailed error handling.
ts
import {
  experimental_transcribe as transcribe,
  NoTranscriptGeneratedError,
} from 'ai';
import { openai } from '@ai-sdk/openai';
import { readFile } from 'fs/promises';

try {
  await transcribe({
    model: openai.transcription('whisper-1'),
    audio: await readFile('audio.mp3'),
  });
} catch (error) {
  if (NoTranscriptGeneratedError.isInstance(error)) {
    console.log('NoTranscriptGeneratedError');
    console.log('Cause:', error.cause);
    console.log('Responses:', error.responses);
  }
}

Transcription Models

ProviderModel
OpenAIwhisper-1
OpenAIgpt-4o-transcribe
OpenAIgpt-4o-mini-transcribe
ElevenLabsscribe_v1
ElevenLabsscribe_v1_experimental
Groqwhisper-large-v3-turbo
Groqwhisper-large-v3
Azure OpenAIwhisper-1
Azure OpenAIgpt-4o-transcribe
Azure OpenAIgpt-4o-mini-transcribe
Rev.aimachine
Rev.ailow_cost
Rev.aifusion
Deepgrambase (+ variants)
Deepgramenhanced (+ variants)
Deepgramnova (+ variants)
Deepgramnova-2 (+ variants)
Deepgramnova-3 (+ variants)
Gladiadefault
AssemblyAIbest
AssemblyAInano
Falwhisper
Falwizper

Above are a small subset of the transcription models supported by the AI SDK providers. For more, see the respective provider documentation.