Back to Daft

AI Benchmarks

benchmarking/ai/README.md

0.7.10974 B
Original Source

AI Benchmarks

This repository contains performance benchmarks comparing different data processing engines (Daft, Ray Data, and Spark) across various AI and multimodal data processing workloads.

Overview

The benchmarks cover four different workload types, each designed to test different aspects of multimodal data processing:

  1. Audio Transcription - Transcribing audio files
  2. Document Embedding - Generating embeddings for PDF documents
  3. Image Classification - Classify images
  4. Video Object Detection - Detect objects in videos

Performance Results Summary

WorkloadData SizeDaftRay DataSpark
Audio Transcription113,800 audio files6m 22s29m 20s25m 46s
Document Embedding10,000 PDFs1m 54s14m 32s8m 4s
Image Classification803,580 images4m 23s23m 30s45m 7s
Video Object Detection1,000 videos11m 46s25m 54s3h 36m