docs/en/platform/deploy/index.md
Ultralytics Platform provides comprehensive deployment options for putting your YOLO models into production. Test models with browser-based inference, deploy to dedicated endpoints across 43 global regions, and monitor performance in real-time.
<p align="center"> <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/JjgQYPetX8w" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen> </iframe><strong>Watch:</strong> Get Started with Ultralytics Platform - Deploy
</p>The Deployment section helps you:
Predict tabUltralytics Platform offers multiple deployment paths:
| Option | Description | Best For |
|---|---|---|
| Predict Tab | Browser-based inference with image, webcam, and examples | Development, validation |
| Shared Inference | Multi-tenant service across 3 regions | Light usage, testing |
| Dedicated Endpoints | Single-tenant services across 43 regions | Production, low latency |
graph LR
A[✅ Test] --> B[⚙️ Configure]
B --> C[🌐 Deploy]
C --> D[📊 Monitor]
style A fill:#4CAF50,color:#fff
style B fill:#2196F3,color:#fff
style C fill:#FF9800,color:#fff
style D fill:#9C27B0,color:#fff
| Stage | Description |
|---|---|
| Test | Validate model with the Predict tab |
| Configure | Select region and deployment name (deployments use fixed default resources) |
| Deploy | Create a dedicated endpoint from the Deploy tab |
| Monitor | Track requests, latency, errors, and logs in Monitoring |
The shared inference service runs in 3 key regions, automatically routing requests based on your data region:
graph TB
User[User Request] --> API[Platform API]
API --> Router{Region Router}
Router -->|US users| US["US Predict Service
Iowa"]
Router -->|EU users| EU["EU Predict Service
Belgium"]
Router -->|AP users| AP["AP Predict Service
Taiwan"]
style User fill:#f5f5f5,color:#333
style API fill:#2196F3,color:#fff
style Router fill:#FF9800,color:#fff
style US fill:#4CAF50,color:#fff
style EU fill:#4CAF50,color:#fff
style AP fill:#4CAF50,color:#fff
| Region | Location |
|---|---|
| US | Iowa, USA |
| EU | Belgium, Europe |
| AP | Taiwan, Asia-Pacific |
Deploy to 43 regions worldwide on Ultralytics Cloud:
Each endpoint is a single-tenant service with:
1 CPU, 2 GiB memory, minInstances=0, maxInstances=1Access the global deployments page from the sidebar under Deploy. This page shows:
!!! info "Automatic Polling"
The page polls every 15 seconds normally. When deployments are in a transitional state (`creating`, `deploying`, or `stopping`), polling increases to every 3 seconds for faster feedback.
Deploy close to your users with 43 regions covering:
Endpoints currently behave as follows:
maxInstances is currently capped at 1 on all plans!!! tip "Cost Savings"
Scale-to-zero is enabled by default (min instances = 0). You only pay for active inference time.
Dedicated endpoints provide:
Each running deployment includes an automatic health check with:
Deploy a model in under 2 minutes:
!!! example "Quick Deploy"
```
Model → Deploy tab → Select region → Click Deploy → Endpoint URL ready
```
Once deployed, use the endpoint URL with your API key to send inference requests from any application.
| Feature | Shared | Dedicated |
|---|---|---|
| Latency | Variable | Consistent |
| Cost | Free (included) | Free (basic), usage-based (advanced) |
| Scale | Limited | Scale-to-zero, single instance |
| Regions | 3 | 43 |
| URL | Generic | Custom |
| Rate | 20 req/min | Unlimited |
Dedicated endpoint deployment typically takes 1-2 minutes:
Yes, each model can have multiple endpoints in different regions. Deployment counts are limited by plan: Free 3, Pro 10, Enterprise unlimited.
With scale-to-zero enabled:
First requests after an idle period trigger a cold start.