docs/content/features/object-detection.md
+++ disableToc = false title = "Object Detection" weight = 13 url = "/features/object-detection/" +++
LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include RF-DETR (Python) and rf-detr.cpp (native C++/ggml) for object detection and segmentation, and sam3.cpp for image segmentation (SAM 3/2/EdgeTAM).
For detecting faces specifically, see the dedicated
Face Recognition feature — its
/v1/detection support is tuned for face bounding boxes and ships
with commercially-safe model options.
Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.
Key Features:
/v1/detection endpointLocalAI provides a dedicated /v1/detection endpoint for object detection tasks. This endpoint is specifically designed for object detection and returns structured detection results with bounding boxes and confidence scores.
To perform object detection, send a POST request to the /v1/detection endpoint:
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "rfdetr-base",
"image": "https://media.roboflow.com/dog.jpeg"
}'
The request body should contain:
model: The name of the object detection model (e.g., "rfdetr-base")image: The image to analyze, which can be:
prompt (optional): Text prompt for text-prompted segmentation (SAM 3 only)points (optional): Point coordinates as [x, y, label, ...] triples (label: 1=positive, 0=negative)boxes (optional): Box coordinates as [x1, y1, x2, y2, ...] quadsthreshold (optional): Detection confidence threshold (default: 0.5)The API returns a JSON response with detected objects:
{
"detections": [
{
"x": 100.5,
"y": 150.2,
"width": 200.0,
"height": 300.0,
"confidence": 0.95,
"class_name": "dog"
},
{
"x": 400.0,
"y": 200.0,
"width": 150.0,
"height": 250.0,
"confidence": 0.87,
"class_name": "person"
}
]
}
Each detection includes:
x, y: Coordinates of the bounding box top-left cornerwidth, height: Dimensions of the bounding boxconfidence: Detection confidence score (0.0 to 1.0)class_name: The detected object classmask (optional): Base64-encoded PNG binary segmentation mask (SAM backends only)The RF-DETR backend is implemented as a Python-based gRPC service that integrates seamlessly with LocalAI. It provides object detection capabilities using the RF-DETR model architecture and supports multiple hardware configurations:
Using the Model Gallery (Recommended)
The easiest way to get started is using the model gallery. The rfdetr-base model is available in the official LocalAI gallery:
# Install and run the rfdetr-base model
local-ai run rfdetr-base
You can also install it through the web interface by navigating to the Models section and searching for "rfdetr-base".
Manual Configuration
Create a model configuration file in your models directory:
name: rfdetr
backend: rfdetr
parameters:
model: rfdetr-base
Currently, the following model is available in the [Model Gallery]({{%relref "features/model-gallery" %}}):
You can browse and install this model through the LocalAI web interface or using the command line.
The rfdetr-cpp backend is a native C++/ggml implementation of RF-DETR
inference based on rf-detr.cpp. It
runs as a Go gRPC service that dlopens a per-CPU-variant shared library, so
there is no Python runtime on the inference path — startup is fast and the
binary is self-contained.
Compared to the Python rfdetr backend, the native backend:
Detection.maskInstall the backend
local-ai backends install rfdetr-cpp
Using the Model Gallery (Recommended)
The gallery ships ready-to-run entries for every published variant:
# Detection variants
local-ai run rfdetr-cpp-nano
local-ai run rfdetr-cpp-small
local-ai run rfdetr-cpp-base
local-ai run rfdetr-cpp-medium
local-ai run rfdetr-cpp-large
# Segmentation variants (return per-instance PNG masks)
local-ai run rfdetr-cpp-seg-nano
local-ai run rfdetr-cpp-seg-small
local-ai run rfdetr-cpp-seg-medium
local-ai run rfdetr-cpp-seg-large
local-ai run rfdetr-cpp-seg-xlarge
local-ai run rfdetr-cpp-seg-2xlarge
Manual Configuration
name: rfdetr-cpp-seg-nano
backend: rfdetr-cpp
parameters:
model: rfdetr-seg-nano-f16.gguf
threads: 4
known_usecases:
- detection
Pre-quantized GGUFs are published under
mudler/rfdetr-cpp-*
on Hugging Face. Each repo carries the F32/F16/Q8_0/Q4_K quants — F16 is
the recommended default (matches F32 accuracy, ~1.86x smaller).
When running a segmentation model (any rfdetr-cpp-seg-* variant), each
Detection in the response carries a mask field with a base64-encoded
PNG of the per-instance binary mask. The mask is sized to the original
image resolution and aligns with the corresponding bounding box.
The sam3-cpp backend provides image segmentation using sam3.cpp, a portable C++ implementation of Meta's Segment Anything Model. It supports multiple model architectures:
Manual Configuration
Create a model configuration file in your models directory:
name: sam3
backend: sam3-cpp
parameters:
model: edgetam_q4_0.ggml
threads: 4
known_usecases:
- detection
Download the model from Hugging Face.
Point-prompted segmentation (all models):
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"points": [256.0, 256.0, 1.0],
"threshold": 0.5
}'
Box-prompted segmentation (all models):
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"boxes": [100.0, 100.0, 400.0, 400.0],
"threshold": 0.5
}'
Text-prompted segmentation (SAM 3 full model only):
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"prompt": "cat",
"threshold": 0.5
}'
The response includes segmentation masks as base64-encoded PNGs in the mask field of each detection.
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "rfdetr-base",
"image": "https://example.com/image.jpg"
}'
base64_image=$(base64 -w 0 image.jpg)
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d "{
\"model\": \"rfdetr-base\",
\"image\": \"data:image/jpeg;base64,$base64_image\"
}"
Model Loading Errors
Low Detection Accuracy
Slow Performance
Enable debug logging for troubleshooting:
local-ai run --debug rfdetr-base
LocalAI includes a dedicated object-detection category for models and backends that specialize in identifying and locating objects within images. This category currently includes:
Additional object detection models and backends will be added to this category in the future. You can filter models by the object-detection tag in the model gallery to find all available object detection models.