docs/examples/index.md
Multimodal Structured Outputs: Evaluating Image Understanding
Leverage image ablation to analyze textual bias in image understanding datasets.
Voice AI Analytics with Faster-Whisper and embed_text
Transcribe audio files into segments with timestamps and embed content.
MinHash Deduplication on Common Crawl
Clean web text at scale with MinHash, LSH Banding, and Connected Components.
Getting Started with Common Crawl
Daft provides a simple, performant, and responsible way to access Common Crawl data.
Audio Transcription with Whisper
Effortlessly transcribe audio to text at scale.
Build a 100% GPU Utilization Text Embedding Pipeline featuring spaCy and Turbopuffer
Generate and store millions of text embeddings in vector databases using distributed GPU processing and state-of-the-art models.
Generate Images with Stable Diffusion
Open Source image generation model on your own GPUs using Daft UDFs.
Daft's Four UDF Pattern Tutorial
One notebook, four UDF patterns, one dataset. Row-wise, generator, async, and stateful -- learn when to use each.
Window Functions: The Great Chocolate Race
Explore how window functions can reduce complex joins and groupby's to just a few simple operations.
Running LLMs on the Red Pajamas Dataset
Perform similarity search on Stack Exchange questions using language models and embeddings.
!!! tip "More Examples" For more examples, check out our new daft-examples repository!