MastraVoice

The MastraVoice class is an abstract base class that defines the core interface for voice services in Mastra. All voice provider implementations (like OpenAI, Deepgram, PlayAI, Speechify) extend this class to provide their specific functionality. The class now includes support for real-time speech-to-speech capabilities through WebSocket connections.

Usage example

typescript

import { MastraVoice } from '@mastra/core/voice'

// Create a voice provider implementation
class MyVoiceProvider extends MastraVoice {
  constructor(config: {
    speechModel?: BuiltInModelConfig
    listeningModel?: BuiltInModelConfig
    speaker?: string
    realtimeConfig?: {
      model?: string
      apiKey?: string
      options?: unknown
    }
  }) {
    super({
      speechModel: config.speechModel,
      listeningModel: config.listeningModel,
      speaker: config.speaker,
      realtimeConfig: config.realtimeConfig,
    })
  }

  // Implement required abstract methods
  async speak(
    input: string | NodeJS.ReadableStream,
    options?: { speaker?: string },
  ): Promise<NodeJS.ReadableStream | void> {
    // Implement text-to-speech conversion
  }

  async listen(
    audioStream: NodeJS.ReadableStream,
    options?: unknown,
  ): Promise<string | NodeJS.ReadableStream | void> {
    // Implement speech-to-text conversion
  }

  async getSpeakers(): Promise<Array<{ voiceId: string; [key: string]: unknown }>> {
    // Return list of available voices
  }

  // Optional speech-to-speech methods
  async connect(): Promise<void> {
    // Establish WebSocket connection for speech-to-speech communication
  }

  async send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void> {
    // Stream audio data in speech-to-speech
  }

  async answer(): Promise<void> {
    // Trigger voice provider to respond
  }

  addTools(tools: Array<unknown>): void {
    // Add tools for the voice provider to use
  }

  close(): void {
    // Close WebSocket connection
  }

  on(event: string, callback: (data: unknown) => void): void {
    // Register event listener
  }

  off(event: string, callback: (data: unknown) => void): void {
    // Remove event listener
  }
}

Constructor parameters

Abstract methods

These methods must be implemented by unknown class extending MastraVoice.

`speak()`

Converts text to speech using the configured speech model.

typescript

abstract speak(
  input: string | NodeJS.ReadableStream,
  options?: {
    speaker?: string;
    [key: string]: unknown;
  }
): Promise<NodeJS.ReadableStream | void>

Purpose:

Takes text input and converts it to speech using the provider's text-to-speech service
Supports both string and stream input for flexibility
Allows overriding the default speaker/voice through options
Returns a stream of audio data that can be played or saved
May return void if the audio is handled by emitting 'speaking' event

`listen()`

Converts speech to text using the configured listening model.

typescript

abstract listen(
  audioStream: NodeJS.ReadableStream,
  options?: {
    [key: string]: unknown;
  }
): Promise<string | NodeJS.ReadableStream | void>

Purpose:

Takes an audio stream and converts it to text using the provider's speech-to-text service
Supports provider-specific options for transcription configuration
Can return either a complete text transcription or a stream of transcribed text
Not all providers support this functionality (e.g., PlayAI, Speechify)
May return void if the transcription is handled by emitting 'writing' event

`getSpeakers()`

Returns a list of available voices supported by the provider.

typescript

abstract getSpeakers(): Promise<Array<{ voiceId: string; [key: string]: unknown }>>

Purpose:

Retrieves the list of available voices/speakers from the provider
Each voice must have at least a voiceId property
Providers can include additional metadata about each voice
Used to discover available voices for text-to-speech conversion

Optional methods

These methods have default implementations but can be overridden by voice providers that support speech-to-speech capabilities.

`connect()`

Establishes a WebSocket or WebRTC connection for communication.

typescript

connect(config?: unknown): Promise<void>

Purpose:

Initializes a connection to the voice service for communication
Must be called before using features like send() or answer()
Returns a Promise that resolves when the connection is established
Configuration is provider-specific

`send()`

Streams audio data in real-time to the voice provider.

typescript

send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void>

Purpose:

Sends audio data to the voice provider for real-time processing
Useful for continuous audio streaming scenarios like live microphone input
Supports both ReadableStream and Int16Array audio formats
Must be in connected state before calling this method

`answer()`

Triggers the voice provider to generate a response.

typescript

answer(): Promise<void>

Purpose:

Sends a signal to the voice provider to generate a response
Used in real-time conversations to prompt the AI to respond
Response will be emitted through the event system (e.g., 'speaking' event)

`addTools()`

Equips the voice provider with tools that can be used during conversations.

typescript

addTools(tools: Array<Tool>): void

Purpose:

Adds tools that the voice provider can use during conversations
Tools can extend the capabilities of the voice provider
Implementation is provider-specific

`close()`

Disconnects from the WebSocket or WebRTC connection.

typescript

close(): void

Purpose:

Closes the connection to the voice service
Cleans up resources and stops any ongoing real-time processing
Should be called when you're done with the voice instance

`on()`

Registers an event listener for voice events.

typescript

on<E extends VoiceEventType>(
  event: E,
  callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void

Purpose:

Registers a callback function to be called when the specified event occurs
Standard events include 'speaking', 'writing', and 'error'
Providers can emit custom events as well
Event data structure depends on the event type

`off()`

Removes an event listener.

typescript

off<E extends VoiceEventType>(
  event: E,
  callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void

Purpose:

Removes a previously registered event listener
Used to clean up event handlers when they're no longer needed

Event system

The MastraVoice class includes an event system for real-time communication. Standard event types include:

Protected properties

Telemetry support

MastraVoice includes built-in telemetry support through the traced method, which wraps method calls with performance tracking and error monitoring.

Notes

MastraVoice is an abstract class and can't be instantiated directly
Implementations must provide concrete implementations for all abstract methods
The class provides a consistent interface across different voice service providers
Speech-to-speech capabilities are optional and provider-specific
The event system enables asynchronous communication for real-time interactions
Telemetry is automatically handled for all method calls

Reference: MastraVoice | Voice

MastraVoice

Usage example

Constructor parameters

Abstract methods

speak()

listen()

getSpeakers()

Optional methods

connect()

send()

answer()

addTools()

close()

on()

off()

Event system

Protected properties

Telemetry support

Notes

`speak()`

`listen()`

`getSpeakers()`

`connect()`

`send()`

`answer()`

`addTools()`

`close()`

`on()`

`off()`