docs/src/content/en/reference/voice/mastra-voice.mdx
The MastraVoice class is an abstract base class that defines the core interface for voice services in Mastra. All voice provider implementations (like OpenAI, Deepgram, PlayAI, Speechify) extend this class to provide their specific functionality. The class now includes support for real-time speech-to-speech capabilities through WebSocket connections.
import { MastraVoice } from '@mastra/core/voice'
// Create a voice provider implementation
class MyVoiceProvider extends MastraVoice {
constructor(config: {
speechModel?: BuiltInModelConfig
listeningModel?: BuiltInModelConfig
speaker?: string
realtimeConfig?: {
model?: string
apiKey?: string
options?: unknown
}
}) {
super({
speechModel: config.speechModel,
listeningModel: config.listeningModel,
speaker: config.speaker,
realtimeConfig: config.realtimeConfig,
})
}
// Implement required abstract methods
async speak(
input: string | NodeJS.ReadableStream,
options?: { speaker?: string },
): Promise<NodeJS.ReadableStream | void> {
// Implement text-to-speech conversion
}
async listen(
audioStream: NodeJS.ReadableStream,
options?: unknown,
): Promise<string | NodeJS.ReadableStream | void> {
// Implement speech-to-text conversion
}
async getSpeakers(): Promise<Array<{ voiceId: string; [key: string]: unknown }>> {
// Return list of available voices
}
// Optional speech-to-speech methods
async connect(): Promise<void> {
// Establish WebSocket connection for speech-to-speech communication
}
async send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void> {
// Stream audio data in speech-to-speech
}
async answer(): Promise<void> {
// Trigger voice provider to respond
}
addTools(tools: Array<unknown>): void {
// Add tools for the voice provider to use
}
close(): void {
// Close WebSocket connection
}
on(event: string, callback: (data: unknown) => void): void {
// Register event listener
}
off(event: string, callback: (data: unknown) => void): void {
// Remove event listener
}
}
<PropertiesTable content={[ { name: 'config', type: 'VoiceConfig', description: 'Configuration object for the voice service', isOptional: true, }, { name: 'config.speechModel', type: 'BuiltInModelConfig', description: 'Configuration for the text-to-speech model', isOptional: true, properties: [ { type: 'BuiltInModelConfig', parameters: [ { name: 'name', type: 'string', description: 'Name of the model to use', isOptional: false, }, { name: 'apiKey', type: 'string', description: 'API key for the model service', isOptional: true, }, ], }, ], }, { name: 'config.listeningModel', type: 'BuiltInModelConfig', description: 'Configuration for the speech-to-text model', isOptional: true, properties: [ { type: 'BuiltInModelConfig', parameters: [ { name: 'name', type: 'string', description: 'Name of the model to use', isOptional: false, }, { name: 'apiKey', type: 'string', description: 'API key for the model service', isOptional: true, }, ], }, ], }, { name: 'config.speaker', type: 'string', description: 'Default speaker/voice ID to use', isOptional: true, }, { name: 'config.name', type: 'string', description: 'Name for the voice provider instance', isOptional: true, }, { name: 'config.realtimeConfig', type: 'object', description: 'Configuration for real-time speech-to-speech capabilities', isOptional: true, properties: [ { type: 'object', parameters: [ { name: 'model', type: 'string', description: 'Model to use for real-time speech-to-speech capabilities', isOptional: true, }, { name: 'apiKey', type: 'string', description: 'API key for the real-time service', isOptional: true, }, { name: 'options', type: 'unknown', description: 'Provider-specific options for real-time capabilities', isOptional: true, }, ], }, ], }, ]} />
These methods must be implemented by unknown class extending MastraVoice.
speak()Converts text to speech using the configured speech model.
abstract speak(
input: string | NodeJS.ReadableStream,
options?: {
speaker?: string;
[key: string]: unknown;
}
): Promise<NodeJS.ReadableStream | void>
Purpose:
listen()Converts speech to text using the configured listening model.
abstract listen(
audioStream: NodeJS.ReadableStream,
options?: {
[key: string]: unknown;
}
): Promise<string | NodeJS.ReadableStream | void>
Purpose:
getSpeakers()Returns a list of available voices supported by the provider.
abstract getSpeakers(): Promise<Array<{ voiceId: string; [key: string]: unknown }>>
Purpose:
These methods have default implementations but can be overridden by voice providers that support speech-to-speech capabilities.
connect()Establishes a WebSocket or WebRTC connection for communication.
connect(config?: unknown): Promise<void>
Purpose:
send()Streams audio data in real-time to the voice provider.
send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void>
Purpose:
answer()Triggers the voice provider to generate a response.
answer(): Promise<void>
Purpose:
addTools()Equips the voice provider with tools that can be used during conversations.
addTools(tools: Array<Tool>): void
Purpose:
close()Disconnects from the WebSocket or WebRTC connection.
close(): void
Purpose:
on()Registers an event listener for voice events.
on<E extends VoiceEventType>(
event: E,
callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void
Purpose:
off()Removes an event listener.
off<E extends VoiceEventType>(
event: E,
callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void
Purpose:
The MastraVoice class includes an event system for real-time communication. Standard event types include:
<PropertiesTable content={[ { name: 'speaking', type: '{ text: string; audioStream?: NodeJS.ReadableStream; audio?: Int16Array }', description: 'Emitted when the voice provider is speaking, contains audio data', }, { name: 'writing', type: '{ text: string, role: string }', description: 'Emitted when text is transcribed from speech', }, { name: 'error', type: '{ message: string; code?: string; details?: unknown }', description: 'Emitted when an error occurs', }, ]} />
<PropertiesTable content={[ { name: 'listeningModel', type: 'BuiltInModelConfig | undefined', description: 'Configuration for the speech-to-text model', isOptional: true, }, { name: 'speechModel', type: 'BuiltInModelConfig | undefined', description: 'Configuration for the text-to-speech model', isOptional: true, }, { name: 'speaker', type: 'string | undefined', description: 'Default speaker/voice ID', isOptional: true, }, { name: 'realtimeConfig', type: '{ model?: string; apiKey?: string; options?: unknown } | undefined', description: 'Configuration for real-time speech-to-speech capabilities', isOptional: true, }, ]} />
MastraVoice includes built-in telemetry support through the traced method, which wraps method calls with performance tracking and error monitoring.