Object Tracking

In some cases, it's important for us to track objects across multiple frames of a video. For example, we may need to figure out the direction a vehicle is moving, or count objects in a frame. Some Supervision Annotators and Tools like LineZone require tracking to be setup. In this cookbook, we'll cover how to get a tracker up and running for use in your computer vision applications.

What is a Tracker?

Trackers are a piece of code that identifies objects across frames and assigns them a unique tracker_id. There are a few popular trackers at the time of writing this including ByteTrack and Bot-SORT. Supervision makes using trackers a breeze and comes with ByteTrack built-in.

Before you start

Let's make sure that we have access to GPU. We can use nvidia-smi command to do that. In case of any problems navigate to Edit -> Notebook settings -> Hardware accelerator, set it to GPU, and then click Save.

python

!nvidia-smi

Install Dependencies

python

!pip install -q inference-gpu "supervision"

Download a Video Asset

Now that we have our environment setup, lets download a video that we can detect objects in. Supervision comes with a great utility to help us hit the ground running. We can use the below snippet to he video is save a video asset in our local directory. It can also be accessed with the variable path_to_video for additional application logic.

python

from supervision.assets import download_assets, VideoAssets

# Download a supervision video asset
path_to_video = download_assets(VideoAssets.PEOPLE_WALKING)

Tracking Objects in a Frame

Now that we have our video installed, let's get to work on tracking objects. We'll first pull in a model from roboflow Inference to detect people in our video. Then let's create a byte_tracker object that we'll pass our detections to. This will give us a tracker_id. We'll then utilize that tracker id to label our detections with a label_annotator to display the tracker id.

python

import supervision as sv
from inference.models.utils import get_roboflow_model

# Load a pre trained yolov8 nano model from Roboflow Inference.
model = get_roboflow_model('yolov8n-640')

# Create a video info object from the video path.
video_info = sv.VideoInfo.from_video_path(path_to_video)

# Create a label annotator for labeling detections with our tracker_id.
label = sv.LabelAnnotator()

# Create a ByteTrack object to track detections.
byte_tracker = sv.ByteTrack(frame_rate=video_info.fps)

# Create a frame generator from video path for iteration of frames.
frame_generator = sv.get_video_frames_generator(path_to_video)

# Grab a frame from the frame_generator.
frame = next(frame_generator)

# Run inference on the frame by passing it to our model.
result = model.infer(frame)[0]

# Convert model results to a supervision detection object.
detections = sv.Detections.from_inference(result)

# Update detections with tracker ids fro byte_tracker.
tracked_detections = byte_tracker.update_with_detections(detections)

# Create labels with tracker_id for label annotator.
labels = [ f"{tracker_id}" for tracker_id in tracked_detections.tracker_id ]

# Apply label annotator to frame.
annotated_frame = label.annotate(scene=frame.copy(), detections=tracked_detections, labels=labels)

# Display the frame.
sv.plot_image(annotated_frame)

Tracking Objects in a Video

Finally, we'll use a utility called VideoSink to save the annotated frames to a video. Let's dive in to the code.

python

from tqdm import tqdm

# Load a pre trained yolov8 nano model from Roboflow Inference.
model = get_roboflow_model('yolov8n-640')

# Create a video info object from the video path.
video_info = sv.VideoInfo.from_video_path(path_to_video)

# Create a label annotator for labeling detections with our tracker_id.
label = sv.LabelAnnotator()

# Create a ByteTrack object to track detections.
byte_tracker = sv.ByteTrack(frame_rate=video_info.fps)

# Create a frame generator from video path for iteration of frames.
frame_generator = sv.get_video_frames_generator(path_to_video)

# Create a video sink context manager to save resulting video.
with sv.VideoSink(target_path="output.mp4", video_info=video_info) as sink:

    # Iterate through frames yielded from the frame_generator.
    for frame in tqdm(frame_generator, total=video_info.total_frames):

        # Run inference on the frame by passing it to our model.
        result = model.infer(frame)[0]

        # Convert model results to a supervision detection object.
        detections = sv.Detections.from_inference(result)

        # Update detections with tracker ids fro byte_tracker.
        tracked_detections = byte_tracker.update_with_detections(detections)

        # Create labels with tracker_id for label annotator.
        labels = [ f"{tracker_id}" for tracker_id in tracked_detections.tracker_id ]

        # Apply label annotator to frame.
        annotated_frame = label.annotate(scene=frame.copy(), detections=tracked_detections, labels=labels)

        # Save the annotated frame to an output video.
        sink.write_frame(frame=annotated_frame)

Let's take a look at our resulting video. It will also be created in your current directory with the name output.mp4 Notice how even with a little flicker, we can see the tracker_id on the people walking in the video. With trackers under your belt, there are now a wide variety of use cases you can solve for! Happy building!