Count In Zone - Supervision

With supervision, you can count the number of objects in a zone in an image or video. In this guide, we will show how to count the number of cars in a traffic video.

View the notebook that accompanies this tutorial.

To make it easier for you to follow our tutorial download the video we will use as an example. You can do this using the supervision.assets module:

python

from supervision.assets import download_assets, VideoAssets

download_assets(VideoAssets.VEHICLES_2)

Initialize a Model and Load Video

First, we need to initialize a model. Let's use a YOLOv8 model with the default COCO checkpoint. We also need to load a video on which to run inference.

Create a YOLO model instance and load the source video using supervision's VideoInfo helper. The model will process each frame during inference, while VideoInfo extracts resolution and frame-rate metadata needed by the polygon zone annotator. A shared color palette ensures consistent zone coloring throughout the output video.

python

import numpy as np
import supervision as sv
import cv2

from ultralytics import YOLO

model = YOLO("yolov8s.pt")

VIDEO = str(VideoAssets.VEHICLES_2)

colors = sv.ColorPalette.default()
video_info = sv.VideoInfo.from_video_path(VIDEO)

Calculate Coordinates

To count objects in a zone, you need to know the coordinates where you want to draw the zone.

You can calculate coordinates using the PolygonZone web utility.

To use the PolygonZone website, you will need to upload an image or frame from a video. You can retrieve a frame using this code:

python

generator = sv.get_video_frames_generator(VIDEO)
iterator = iter(generator)

frame = next(iterator)

cv2.imwrite("first_frame.png", frame)

PolygonZone will give you NumPy arrays that you can use with supervision to count objects in zones.

Save the coordinates in an array:

python

polygons = [
    np.array([[718, 595], [927, 592], [851, 1062], [42, 1059]]),
    np.array([[987, 595], [1199, 595], [1893, 1056], [1015, 1062]]),
]

Define Zones

With the coordinates of the zones to draw ready, we can set up our zones:

Instantiate a PolygonZone for each polygon array, pairing it with a PolygonZoneAnnotator for visual overlay and a BoxAnnotator for drawing detection boxes. Each zone will later trigger on incoming detections to determine which objects fall inside its boundaries, enabling per-zone counting in the inference callback.

python

zones = [
    sv.PolygonZone(polygon=polygon, frame_resolution_wh=video_info.resolution_wh)
    for polygon in polygons
]
zone_annotators = [
    sv.PolygonZoneAnnotator(
        zone=zone,
        color=colors.by_idx(index),
        thickness=4,
        text_thickness=8,
        text_scale=4,
    )
    for index, zone in enumerate(zones)
]
box_annotators = [
    sv.BoxAnnotator(
        color=colors.by_idx(index),
        thickness=4,
        text_thickness=4,
        text_scale=2,
    )
    for index in range(len(polygons))
]

Run Inference

We can run inference on a video using the sv.process_video function. This function accepts a callback that runs inference on each frame and compiles the results into a video.

Below, we can call our YOLOv8 model, annotate predictions and zones, then save the results to a file called result.mp4.

python

def process_frame(frame: np.ndarray, i) -> np.ndarray:
    results = model(frame, imgsz=1280, verbose=False)[0]
    detections = sv.Detections.from_ultralytics(results)

    for zone, zone_annotator, box_annotator in zip(
        zones, zone_annotators, box_annotators
    ):
        mask = zone.trigger(detections=detections)
        detections_filtered = detections[mask]
        frame = box_annotator.annotate(
            scene=frame, detections=detections_filtered, skip_label=True
        )
        frame = zone_annotator.annotate(scene=frame)

    return frame


sv.process_video(source_path=VIDEO, target_path="result.mp4", callback=process_frame)

Here is an example of inference run on the video:

Frequently Asked Questions

How do I count objects in a zone with supervision?

Create sv.PolygonZone with a polygon defining your region. Call zone.trigger(detections) on each frame — it returns a mask of detections inside the zone.

Can I count objects crossing a line instead of entering a zone?

Yes. Use sv.LineZone — define a start and end point. zone.trigger(detections) returns a tuple of two boolean arrays, (crossed_in, crossed_out), indicating which detections crossed the line in each direction. LineZone requires detections.tracker_id; run a tracker first so the same object can be matched across frames.

Can I combine zone counting with tracking?

Yes. You can pass tracker IDs from sv.ByteTrack alongside your detections, but sv.PolygonZone still evaluates the zone on each frame and reports which objects are currently inside it. If you want to count each object only once when it first enters the zone, maintain a set of seen tracker_id values after filtering detections with zone.trigger(detections), or use a dedicated entry/crossing counting tool such as sv.LineZone when it better matches your use case.

Author

Piotr Skalski — Computer Vision Engineer, Roboflow