Sarvam Provider

The Sarvam AI Provider is a library developed to integrate with the AI SDK. This library brings Speech to Text (STT) capabilities to your applications, allowing for seamless interaction with audio and text data.

Setup

The Sarvam provider is available in the sarvam-ai-provider module. You can install it with:

Provider Instance

First, get your Sarvam API Key from the Sarvam Dashboard.

Then initialize Sarvam in your application:

import { createSarvam } from 'sarvam-ai-provider';

const sarvam = createSarvam({
  headers: {
    'api-subscription-key': 'YOUR_API_KEY',
  },
});

<Note> The `api-subscription-key` needs to be passed in headers. Consider using `YOUR_API_KEY` as environment variables for security. </Note>

Transcribe speech to text

import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'fs/promises';

await transcribe({
  model: sarvam.transcription('saarika:v2'),
  audio: await readFile('./src/transcript-test.mp3'),
  providerOptions: {
    sarvam: {
      language_code: 'en-IN',
    },
  },
});

Features

Changing parameters

Change language_code

providerOptions: {
    sarvam: {
      language_code: 'en-IN',
    },
  },

<Note> `language_code` specifies the language of the input audio and is required for accurate transcription. • It is mandatory for the `saarika:v1` model (this model does not support `unknown`). • It is optional for the `saarika:v2` model. • Use `unknown` when the language is not known; in that case, the API will auto‑detect it. Available options: `unknown`, `hi-IN`, `bn-IN`, `kn-IN`, `ml-IN`, `mr-IN`, `od-IN`, `pa-IN`, `ta-IN`, `te-IN`, `en-IN`, `gu-IN`. </Note>

with_timestamps?

providerOptions: {
  sarvam: {
    with_timestamps: true,
  },
},

<Note> `with_timestamps` specifies whether to include start/end timestamps for each word/token. • Type: boolean • When true, each word/token will include start/end timestamps. • Default: false </Note>

with_diarization?

providerOptions: {
  sarvam: {
    with_diarization: true,
  },
},

<Note> `with_diarization` enables speaker diarization (Beta). • Type: boolean • When true, enables speaker diarization. • Default: false </Note>

num_speakers?

providerOptions: {
  sarvam: {
    with_diarization: true,
    num_speakers: 2,
  },
},

<Note> `num_speakers` sets the number of distinct speakers to detect (only when `with_diarization` is true). • Type: number | null • Number of distinct speakers to detect. • Default: null </Note>

References

Sarvam API Docs