docs/en/platform/deploy/index.md
Ultralytics Platform provides comprehensive deployment options for putting your YOLO models into production. Test models with browser-based inference, deploy to dedicated endpoints across 43 global regions, and monitor performance in real-time.
<p align="center"> <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/JjgQYPetX8w" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen> </iframe><strong>Watch:</strong> Get Started with Ultralytics Platform - Deploy
</p>The Deployment section helps you:
Predict tabUltralytics Platform offers multiple deployment paths:
| Option | Description | Best For |
|---|---|---|
| Predict Tab | Browser-based inference with image, webcam, and examples | Development, validation |
| Shared Inference | Multi-tenant service across 3 regions | Light usage, testing |
| Dedicated Endpoints | Single-tenant services across 43 regions | Production, low latency |
graph LR
A[✅ Test]:::start --> B[⚙️ Configure]:::proc
B --> C[🌐 Deploy]:::proc
C --> D[📊 Monitor]:::out
classDef start fill:#4CAF50,color:#fff
classDef proc fill:#2196F3,color:#fff
classDef out fill:#9C27B0,color:#fff
| Stage | Description |
|---|---|
| Test | Validate model with the Predict tab |
| Configure | Select region and deployment name (deployments use fixed default resources) |
| Deploy | Create a dedicated endpoint from the Deploy tab |
| Monitor | Track requests, latency, errors, and logs in Monitoring |
The shared inference service runs in 3 key regions, automatically routing requests based on your data region:
graph TB
User[User Request]:::start --> API[Platform API]:::proc
API --> Router{Region Router}:::decide
Router -->|US users| US["US Predict Service
Iowa"]:::out
Router -->|EU users| EU["EU Predict Service
Belgium"]:::out
Router -->|AP users| AP["AP Predict Service
Taiwan"]:::out
classDef start fill:#4CAF50,color:#fff
classDef proc fill:#2196F3,color:#fff
classDef decide fill:#FF9800,color:#fff
classDef out fill:#9C27B0,color:#fff
| Region | Location |
|---|---|
| US | Iowa, USA |
| EU | Belgium, Europe |
| AP | Taiwan, Asia-Pacific |
Deploy to 43 regions worldwide on Ultralytics Cloud:
Each endpoint is a single-tenant service with:
1 CPU, 2 GiB memory, minInstances=0, maxInstances=1Access the global deployments page from the sidebar under Deploy. This page shows:
!!! info "Automatic Polling"
The page polls every 15 seconds normally. When deployments are in a transitional state (`creating`, `deploying`, or `stopping`), polling increases to every 3 seconds for faster feedback.
Deploy close to your users with 43 regions covering:
Endpoints currently behave as follows:
maxInstances is currently capped at 1 on all plans!!! tip "Cost Savings"
Scale-to-zero is enabled by default (min instances = 0). You only pay for active inference time.
Dedicated endpoints provide:
Each running deployment includes an automatic health check with:
Deploy a model in under 2 minutes:
!!! example "Quick Deploy"
```
Model → Deploy tab → Select region → Click Deploy → Endpoint URL ready
```
Once deployed, use the endpoint URL with your API key to send inference requests from any application.
| Feature | Shared | Dedicated |
|---|---|---|
| Latency | Variable | Consistent |
| Cost | Free (included) | Free (basic), usage-based (advanced) |
| Scale | Limited | Scale-to-zero, single instance |
| Regions | 3 | 43 |
| URL | Generic | Custom |
| Rate | 20 req/min | 20 req/min via Platform; unlimited on direct endpoint URL |
Dedicated endpoint deployment typically takes 1-2 minutes:
Yes, each model can have multiple endpoints in different regions. Deployment counts are limited by plan: Free 3, Pro 10, Enterprise unlimited.
With scale-to-zero enabled:
First requests after an idle period trigger a cold start.