docs/modalities/videos.md
There are two main ways to work with videos in Daft:
daft.read_video_frames to read frames of a video into a DataFrame.daft.VideoFile class to work with video files and metadata.daft.VideoFile is a subclass of daft.File that provides a specialized interface for video-specific operations.
VideoFiledaft.read_video_framesThis example shows reading a video's frames into a DataFrame using the daft.read_video_frames function.
=== "๐ Python"
```python
import daft
df = daft.read_video_frames(
path="s3://daft-oss-public-data/videos/zoo.mp4",
image_height=480,
image_width=640,
is_key_frame=True, # select only the key frames
)
df.show()
```
You can also downsample frames on the source side by specifying sample_interval_seconds.
=== "๐ Python"
```python
import daft
# Sample approximately one frame per second based on frame_time
df = daft.read_video_frames(
path="s3://daft-oss-public-data/videos/zoo.mp4",
image_height=480,
image_width=640,
sample_interval_seconds=1.0,
)
df.show()
```
The sample_interval_seconds parameter enables time-based frame sampling, which is particularly useful for:
The sampling algorithm:
Constant Frame Rate (CFR) Videos:
Variable Frame Rate (VFR) Videos:
Frame Timestamp Precision:
Example 1: Uniform CFR Video
Frame timestamps: [0.0, 0.033, 0.067, 0.100, 0.133, 0.167, 0.200, ...]
sample_interval_seconds=0.1
Sampled frames: [0.0, 0.100, 0.200, ...] # Exact matches
Example 2: Non-uniform Timestamps
Frame timestamps: [0.0, 0.95, 1.05, 2.0, 2.95, 3.05]
sample_interval_seconds=1.0
Sampled frames: [0.0, 1.05, 2.0, 3.05] # First frame >= target time
Example 3: Large Frame Interval
Frame timestamps: [0.0, 2.5, 5.0]
sample_interval_seconds=1.0
Sampled frames: [0.0, 2.5, 5.0] # Closest available frames
Example 4: VFR Video
Frame timestamps: [0.0, 0.033, 0.100, 0.133, 0.233, 0.267, 1.0, 1.033]
sample_interval_seconds=1.0
Sampled frames: [0.0, 1.0] # Frames at 0.0s and 1.0s
You can combine time-based sampling with key frame filtering:
=== "๐ Python"
```python
import daft
# Sample key frames at 1-second intervals
df = daft.read_video_frames(
path="s3://daft-oss-public-data/videos/zoo.mp4",
image_height=480,
image_width=640,
is_key_frame=True,
sample_interval_seconds=1.0,
)
df.show()
```
This is useful for:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ path โ frame_index โ frame_time โ frame_time_base โ frame_pts โ frame_dts โ frame_duration โ is_key_frame โ data โ
โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ
โ Utf8 โ Int64 โ Float64 โ Utf8 โ Int64 โ Int64 โ Int64 โ Boolean โ Image[RGB; 480 x 640] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโชโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโก
โ s3://daft-oss-public-data/videos/โฆ โ 0 โ 0 โ 1/15360 โ 0 โ 0 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 1 โ 4 โ 1/15360 โ 61440 โ 61440 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 2 โ 5.333333333333333 โ 1/15360 โ 81920 โ 81920 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 3 โ 9.333333333333334 โ 1/15360 โ 143360 โ 143360 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 4 โ 10.666666666666666 โ 1/15360 โ 163840 โ 163840 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 5 โ 14.666666666666666 โ 1/15360 โ 225280 โ 225280 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 6 โ 16 โ 1/15360 โ 245760 โ 245760 โ 1024 โ true โ <FixedShapeImage> โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโฏ
(Showing first 7 of 7 rows)
!!! note "Note"
You can specify multiple paths and use globs like `daft.read_video_frames("/path/to/file.mp4")` and `daft.read_video_frames("/path/to/files-*.mp4")`
This example shows reading the key frames of a youtube video, you can also pass in a list of video urls.
=== "๐ Python"
```python
import daft
df = daft.read_video_frames(
path=[
"https://www.youtube.com/watch?v=jNQXAC9IVRw",
"https://www.youtube.com/watch?v=N2rZxCrb7iU",
"https://www.youtube.com/watch?v=TF6cnLnEARo",
],
image_height=480,
image_width=640,
is_key_frame=True,
)
df.show()
```
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ path โ frame_index โ frame_time โ frame_time_base โ frame_pts โ frame_dts โ frame_duration โ is_key_frame โ data โ
โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ
โ Utf8 โ Int64 โ Float64 โ Utf8 โ Int64 โ Int64 โ Int64 โ Boolean โ Image[RGB; 480 x 640] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโชโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโก
โ https://www.youtube.com/watchโฆ โ 0 โ 0 โ 1/90000 โ 0 โ 0 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 1 โ 6.8068 โ 1/90000 โ 612612 โ 612612 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 2 โ 13.2132 โ 1/90000 โ 1189188 โ 1189188 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 3 โ 18.018 โ 1/90000 โ 1621620 โ 1621620 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 4 โ 24.8248 โ 1/90000 โ 2234232 โ 2234232 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 5 โ 30.03 โ 1/90000 โ 2702700 โ 2702700 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 6 โ 36.36966666666667 โ 1/90000 โ 3273270 โ 3273270 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 7 โ 43.27656666666667 โ 1/90000 โ 3894891 โ 3894891 โ 3003 โ true โ <FixedShapeImage> โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโฏ
(Showing first 8 rows)
The following example demonstrates how to use daft.VideoFile to read a video file and extract metadata.
import daft
from daft.functions import video_file, video_metadata, video_keyframes
df = (
daft.from_glob_path("hf://datasets/Eventual-Inc/sample-files/videos/*.mp4")
.with_column("file", video_file(daft.col("path")))
.with_column("metadata", video_metadata(daft.col("file")))
.with_column("keyframes", video_keyframes(daft.col("file")))
.select("path", "file", "size", "metadata", "keyframes")
)
df.show(3)
You can also decode frames from a VideoFile column with video_frames. This keeps one row per input video and returns the decoded frames as a list of structs. Use .explode("frames") if you want one row per frame.
import daft
from daft.functions import video_file, video_frames
df = (
daft.from_glob_path("hf://datasets/Eventual-Inc/sample-files/videos/*.mp4")
.with_column("video", video_file(daft.col("path"), verify=True))
.with_column(
"frames",
video_frames(
daft.col("video"),
start_time=0.0,
end_time=0.2,
),
)
.select("path", "frames")
)
df.show(3)