docs/Using-Speech-To-Text.md
Fabric supports speech-to-text transcription of audio and video files using OpenAI's transcription models. This feature allows you to convert spoken content into text that can then be processed through Fabric's patterns.
The STT feature integrates OpenAI's Whisper and GPT-4o transcription models to convert audio/video files into text. The transcribed text is automatically passed as input to your chosen pattern or chat session.
ffmpeg installed on your system.mp3, .mp4, .mpeg, .mpga, .m4a, .wav, .webmTo transcribe an audio file and send the result to a pattern:
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1 --pattern summarize
To just transcribe a file without applying a pattern:
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1
--transcribe-file: Path to the audio or video file to transcribe--transcribe-model: Model to use for transcription (required when using transcription)--split-media-file: Automatically split files larger than 25MB into chunks using ffmpegYou can list all available transcription models with:
fabric --list-transcription-models
Currently supported models:
whisper-1: OpenAI's Whisper modelgpt-4o-mini-transcribe: GPT-4o Mini transcription modelgpt-4o-transcribe: GPT-4o transcription modelFiles under the 25MB limit are processed directly without any special handling.
For files exceeding OpenAI's 25MB limit, you have two options:
--split-media-file--split-media-file flag to automatically split the file into chunksfabric --transcribe-file large_recording.mp4 --transcribe-model whisper-1 --split-media-file --pattern summarize
When splitting is enabled:
ffmpeg to split the file into 10-minute segments initiallyThe transcribed text is seamlessly integrated into Fabric's workflow:
Meeting transcription and summarization:
fabric --transcribe-file meeting.mp4 --transcribe-model gpt-4o-transcribe --pattern summarize
Interview analysis:
fabric --transcribe-file interview.mp3 --transcribe-model whisper-1 --pattern extract_insights
Large video file processing:
fabric --transcribe-file presentation.mp4 --transcribe-model gpt-4o-transcribe --split-media-file --pattern create_summary
Common error scenarios:
--split-media-file for files over 25MB--list-transcription-models to see available models--transcribe-model flag is required when using --transcribe-fileinternal/cli/transcribe.go:14internal/plugins/ai/openai/openai_audio.go:41transcriber interfaceCurrently, only OpenAI is supported for transcription, but the interface allows for future expansion to other vendors that provide transcription capabilities.