examples/audio_to_text/README.md
A plain table you can query, join, or feed into an embedding pipeline — in plain async Python.
</p> <p align="center"> <strong>Star us ❤️ →</strong> <a href="https://github.com/cocoindex-io/cocoindex" title="Star CocoIndex on GitHub"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-light.svg"></picture></a> · <a href="https://cocoindex.io/docs/examples/audio-to-text/" title="Read the full walkthrough"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-light.svg"></picture></a> · <a href="https://discord.com/invite/zpA9S2DR7s" title="Join the CocoIndex Discord"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-light.svg"></picture></a> </p> <div align="center"> </div>A folder of voice memos, meeting recordings, and podcast clips is dead weight until it's text. CocoIndex walks the directory, sends every file to a LiteLLM transcription model, and writes the result to Postgres as one row per file, keyed by filename. You declare the transformation in native Python and your own types — target_state = transformation(source_state) — and the heavy lifting (incremental processing, change tracking, managed targets) runs in a Rust engine underneath, so only new or changed files get re-transcribed and removed files have their rows cleaned up automatically.
The indexing path is the shortest there is — no chunking, one row per file:
.mp3, .wav, .m4a, .flac, .ogg, .webm, .aac, .aiff).whisper-1 by default).AudioTranscription row per file in Postgres, keyed by filename.process_file runs once per file: read the audio, transcribe it, declare a single target row. Read it in main.py:
_transcriber = LiteLLMTranscriber("whisper-1")
@dataclass
class AudioTranscription:
filename: str
text: str
@coco.fn(memo=True) # unchanged file is never re-transcribed
async def process_file(
file: localfs.File,
table: postgres.TableTarget[AudioTranscription],
) -> None:
transcript = await _transcriber.transcribe(file)
table.declare_row(
row=AudioTranscription(filename=str(file.file_path.path), text=transcript),
)
mount_table_target creates and manages the Postgres table for you with primary_key=["filename"] — so each file maps to exactly one row, the table doubles as an index of what's been transcribed, and re-runs upsert only what changed.
Step-by-step walkthrough with the row schema, the LiteLLM transcriber, the managed Postgres target, and exactly what happens on each kind of change.
</p>LiteLLMTranscriber("whisper-1") wraps LiteLLM's transcription API — change that string (and the matching credential) for elevenlabs/scribe_v1, a self-hosted endpoint, whatever.filename is the primary key, so the output table doubles as a record of which files have been transcribed — no separate bookkeeping.@coco.fn(memo=True) skips a file when its content and the function's code are both unchanged, so you never pay for the same transcription twice.mount_table_target handles schema, idempotent upserts, and orphan cleanup — delete a file and its row is removed automatically.Needs a running Postgres and LiteLLM credentials for the transcription model (the default
whisper-1usesOPENAI_API_KEY).
1. Start Postgres — a ready compose file ships in the repo:
docker compose -f ../../dev/postgres.yaml up -d
2. Configure & install:
cp .env.example .env # set OPENAI_API_KEY; POSTGRES_URL defaults to the local container
pip install -e .
3. Build the table — drop a few audio files into audio_files/, then:
cocoindex update main.py
This writes to coco_examples.audio_transcriptions, with filename as the primary key and text as the transcript.
4. Check the results with plain SQL:
psql "$POSTGRES_URL" -c \
'SELECT filename, left(text, 200) AS preview FROM coco_examples.audio_transcriptions ORDER BY filename;'
Re-running cocoindex update main.py incrementally processes only added, changed, and removed files.
<a href="https://cocoindex.io/docs">Docs</a> · <a href="https://cocoindex.io/docs/examples/audio-to-text/">Walkthrough</a> · <a href="https://discord.com/invite/zpA9S2DR7s">Discord</a> · <a href="https://github.com/cocoindex-io/cocoindex/tree/main/examples"><b>See all examples →</b></a>
</p>