Back to Agno

Image Bounding Boxes

cookbook/data_labeling/_08_image_bounding_boxes/README.md

2.6.81.3 KB
Original Source

Image Bounding Boxes

Detect objects in an image and return their bounding boxes. The model emits normalized coordinates in [0, 1] so the result is resolution- independent.

Files

  • basic.py — detect one labeled object with a bounding box.
  • with_confidence.py — adds per-box confidence.
  • multi_object.py — detect multiple objects of multiple classes.

When to use

  • Pre-labeling for an object detection training set (human-in-the-loop refinement on top).
  • Crop suggestions for product imagery.
  • Coarse spatial routing (counting people, vehicles, defects).

For pixel-accurate masks, this primitive isn't the right tool - a segmentation model is. For "is X in the image" without coordinates, use _06_image_classification/ with multilabel.

Coordinate convention

Coordinates are normalized to the image dimensions:

  • x, y = top-left corner, in [0, 1]
  • width, height = box size, in [0, 1]

Multiply by the actual image width/height to get pixel coordinates.

Run

bash
python cookbook/data_labeling/_08_image_bounding_boxes/basic.py
python cookbook/data_labeling/_08_image_bounding_boxes/with_confidence.py
python cookbook/data_labeling/_08_image_bounding_boxes/multi_object.py

Requires OPENAI_API_KEY.