docs/en/platform/deploy/inference.md
Ultralytics Platform provides an inference API for testing trained models. Use the browser-based Predict tab for quick validation or the REST API for programmatic access.
Every model includes a Predict tab for browser-based inference:
The predict panel supports multiple input methods:
| Method | Description |
|---|---|
| Image upload | Drag and drop or click to upload an image |
| Example images | Click built-in examples (dataset images or defaults) |
| Webcam capture | Live camera feed with single-frame capture |
graph LR
A[Upload Image] --> D[Auto-Inference]
B[Example Image] --> D
C[Webcam Capture] --> D
D --> E[Results + Overlays]
style D fill:#2196F3,color:#fff
style E fill:#4CAF50,color:#fff
Drag and drop or click to upload:
!!! info "Auto-Inference"
The predict panel runs inference automatically when you upload an image, select an example, or capture a webcam frame. No button click is needed.
The predict panel shows example images from your model's linked dataset. If no dataset is linked, default examples are used:
| Image | Content |
|---|---|
bus.jpg | Street scene with vehicles |
zidane.jpg | Sports scene with people |
For OBB models, aerial images of boats and airports are shown instead.
!!! tip "Preloaded Images"
Example images are preloaded when the page loads, so clicking an example triggers near-instant inference with no download wait.
Click the webcam card to start a live camera feed:
Inference results display:
The results panel shows:
| Field | Description |
|---|---|
| Detections list | Each detection with class name and confidence |
| Speed stats | Preprocess, inference, postprocess, network (ms) |
| JSON response | Raw API response in a code block |
Adjust detection behavior with parameters in the collapsible Parameters section:
| Parameter | Range | Default | Description |
|---|---|---|---|
| Confidence | 0.01 – 1.0 | 0.25 | Minimum confidence threshold |
| IoU | 0.0 – 0.95 | 0.7 | NMS IoU threshold |
| Image Size | 320, 640, 1280 (UI toggle) | 640 | Input resize dimension (API accepts any value 32 – 1280) |
!!! note "Auto-Rerun"
Changing any parameter automatically re-runs inference on the current image with a 500ms debounce. No need to re-upload.
Filter predictions by confidence:
Control Non-Maximum Suppression:
Each running dedicated endpoint includes a Predict tab directly on its deployment card. This uses the deployment's own inference service rather than the shared predict service, letting you test your deployed endpoint from the browser.
Access inference programmatically:
Include your API key in requests:
Authorization: Bearer YOUR_API_KEY
!!! warning "API Key Required"
To run inference from your own scripts, notebooks, or apps, include an API key. Generate one in [`Settings > API Keys`](../account/api-keys.md).
POST https://platform.ultralytics.com/api/models/{modelId}/predict
=== "Python"
```python
import requests
url = "https://platform.ultralytics.com/api/models/MODEL_ID/predict"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
files = {"file": open("image.jpg", "rb")}
data = {"conf": 0.25, "iou": 0.7, "imgsz": 640}
response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())
```
=== "cURL"
```bash
curl -X POST \
"https://platform.ultralytics.com/api/models/MODEL_ID/predict" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "[email protected]" \
-F "conf=0.25" \
-F "iou=0.7" \
-F "imgsz=640"
```
=== "JavaScript"
```javascript
const formData = new FormData();
formData.append("file", fileInput.files[0]);
formData.append("conf", "0.25");
formData.append("iou", "0.7");
formData.append("imgsz", "640");
const response = await fetch(
"https://platform.ultralytics.com/api/models/MODEL_ID/predict",
{
method: "POST",
headers: { Authorization: "Bearer YOUR_API_KEY" },
body: formData,
}
);
const result = await response.json();
console.log(result);
```
{
"images": [
{
"shape": [1080, 1920],
"results": [
{
"class": 0,
"name": "person",
"confidence": 0.92,
"box": { "x1": 100, "y1": 50, "x2": 300, "y2": 400 }
},
{
"class": 2,
"name": "car",
"confidence": 0.87,
"box": { "x1": 400, "y1": 200, "x2": 600, "y2": 350 }
}
],
"speed": {
"preprocess": 1.2,
"inference": 12.5,
"postprocess": 2.3
}
}
],
"metadata": {
"imageCount": 1,
"functionTimeCall": 0.018,
"model": "model.pt",
"version": {
"ultralytics": "8.x.x",
"torch": "2.6.0",
"torchvision": "0.21.0",
"python": "3.13.0"
}
}
}
| Field | Type | Description |
|---|---|---|
images | array | List of processed images |
images[].shape | array | Image dimensions [height, width] |
images[].results | array | List of detections |
images[].results[].name | string | Class name |
images[].results[].confidence | float | Detection confidence (0-1) |
images[].results[].box | object | Bounding box coordinates |
images[].speed | object | Processing times in milliseconds |
metadata | object | Request metadata and version info |
Response format varies by task:
=== "Detection"
```json
{
"class": 0,
"name": "person",
"confidence": 0.92,
"box": {"x1": 100, "y1": 50, "x2": 300, "y2": 400}
}
```
=== "Segmentation"
```json
{
"class": 0,
"name": "person",
"confidence": 0.92,
"box": {"x1": 100, "y1": 50, "x2": 300, "y2": 400},
"segments": [[100, 50], [150, 60], ...]
}
```
=== "Pose"
```json
{
"class": 0,
"name": "person",
"confidence": 0.92,
"box": {"x1": 100, "y1": 50, "x2": 300, "y2": 400},
"keypoints": [
{"x": 200, "y": 75, "conf": 0.95},
...
]
}
```
=== "Classification"
```json
{
"results": [
{"class": 0, "name": "cat", "confidence": 0.95},
{"class": 1, "name": "dog", "confidence": 0.03}
]
}
```
=== "OBB"
```json
{
"class": 0,
"name": "ship",
"confidence": 0.89,
"box": {"x1": 100, "y1": 50, "x2": 300, "y2": 400},
"obb": {"x1": 105, "y1": 48, "x2": 295, "y2": 55, "x3": 290, "y3": 395, "x4": 110, "y4": 402}
}
```
Shared inference (the Predict tab and /api/models/{id}/predict endpoint) is included at no additional cost on all plans. There are no per-request charges for shared inference.
For production workloads requiring higher throughput, deploy a dedicated endpoint.
Shared inference is rate-limited to 20 requests/min per API key. When throttled, the API returns 429 with a Retry-After header. See the full rate limit reference for all endpoint categories.
!!! tip "Need More Throughput?"
Deploy a [dedicated endpoint](endpoints.md) for **unlimited** inference with no rate limits, predictable throughput, and consistent low-latency responses. For local inference, see the [Predict mode guide](../../modes/predict.md).
Common error responses:
| Code | Message | Solution |
|---|---|---|
| 400 | Invalid image | Check file format |
| 401 | Unauthorized | Verify API key |
| 404 | Model not found | Check model ID |
| 429 | Rate limited | Wait and retry, or use a dedicated endpoint for unlimited throughput |
| 500 | Server error | Retry request |
Both inference methods accept video files:
/api/models/{id}/predict) uses the same predict service and accepts the same video formats. However, the browser Predict tab in the UI only uploads images — use the REST API directly or a dedicated endpoint for video workflows. The shared endpoint is also rate-limited to 20 req/min, so dedicated endpoints are the better choice for heavy video workloads.The API returns JSON predictions. To visualize:
plot() method:from ultralytics import YOLO
model = YOLO("yolo26n.pt")
results = model("image.jpg")
results[0].save("annotated.jpg")
See the Predict mode documentation for the full results API and visualization options.
Image Size parameterLarge images are automatically resized while preserving aspect ratio.
The current API processes one image per request. For batch:
!!! example "Batch Inference with Python"
```python
import concurrent.futures
import requests
url = "https://predict-abc123.run.app/predict"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
images = ["img1.jpg", "img2.jpg", "img3.jpg"]
def predict(image_path):
with open(image_path, "rb") as f:
return requests.post(url, headers=headers, files={"file": f}).json()
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(predict, images))
```