Using Speech-To-Text (STT) with Fabric

Fabric supports speech-to-text transcription of audio and video files using OpenAI's transcription models. This feature allows you to convert spoken content into text that can then be processed through Fabric's patterns.

Overview

The STT feature integrates OpenAI's Whisper and GPT-4o transcription models to convert audio/video files into text. The transcribed text is automatically passed as input to your chosen pattern or chat session.

Requirements

OpenAI API key configured in Fabric
For files larger than 25MB: ffmpeg installed on your system
Supported audio/video formats: .mp3, .mp4, .mpeg, .mpga, .m4a, .wav, .webm

Basic Usage

Simple Transcription

To transcribe an audio file and send the result to a pattern:

bash

fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1 --pattern summarize

Transcription Only

To just transcribe a file without applying a pattern:

bash

fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1

Command Line Flags

Required Flags

--transcribe-file: Path to the audio or video file to transcribe
--transcribe-model: Model to use for transcription (required when using transcription)

Optional Flags

--split-media-file: Automatically split files larger than 25MB into chunks using ffmpeg

Available Models

You can list all available transcription models with:

bash

fabric --list-transcription-models

Currently supported models:

whisper-1: OpenAI's Whisper model
gpt-4o-mini-transcribe: GPT-4o Mini transcription model
gpt-4o-transcribe: GPT-4o transcription model

File Size Handling

Files Under 25MB

Files under the 25MB limit are processed directly without any special handling.

Files Over 25MB

For files exceeding OpenAI's 25MB limit, you have two options:

Manual handling: The command will fail with an error message suggesting to use --split-media-file
Automatic splitting: Use the --split-media-file flag to automatically split the file into chunks

bash

fabric --transcribe-file large_recording.mp4 --transcribe-model whisper-1 --split-media-file --pattern summarize

When splitting is enabled:

Fabric uses ffmpeg to split the file into 10-minute segments initially
If segments are still too large, it reduces the segment time by half repeatedly
All segments are transcribed and the results are concatenated
Temporary files are automatically cleaned up after processing

Integration with Patterns

The transcribed text is seamlessly integrated into Fabric's workflow:

File is transcribed using the specified model
Transcribed text becomes the input message
Text is sent to the specified pattern or chat session

Example Workflows

Meeting transcription and summarization:

bash

fabric --transcribe-file meeting.mp4 --transcribe-model gpt-4o-transcribe --pattern summarize

Interview analysis:

bash

fabric --transcribe-file interview.mp3 --transcribe-model whisper-1 --pattern extract_insights

Large video file processing:

bash

fabric --transcribe-file presentation.mp4 --transcribe-model gpt-4o-transcribe --split-media-file --pattern create_summary

Error Handling

Common error scenarios:

Unsupported format: Only the listed audio/video formats are supported
File too large: Use --split-media-file for files over 25MB
Missing ffmpeg: Install ffmpeg for automatic file splitting
Invalid model: Use --list-transcription-models to see available models
Missing model: The --transcribe-model flag is required when using --transcribe-file

Technical Details

Implementation

Transcription is handled in internal/cli/transcribe.go:14
OpenAI-specific implementation in internal/plugins/ai/openai/openai_audio.go:41
File splitting uses ffmpeg with configurable segment duration
Supports any vendor that implements the transcriber interface

Processing Pipeline

CLI validates file format and size
If file > 25MB and splitting enabled, file is split using ffmpeg
Each file/segment is sent to OpenAI's transcription API
Results are concatenated with spaces between segments
Transcribed text is passed as input to the main Fabric pipeline

Vendor Support

Currently, only OpenAI is supported for transcription, but the interface allows for future expansion to other vendors that provide transcription capabilities.