docs/en/platform/train/cloud-training.md
Ultralytics Platform Cloud Training offers single-click training on cloud GPUs, making model training accessible without complex setup. Train YOLO models with real-time metrics streaming and automatic checkpoint saving.
graph LR
A[Configure] --> B[Start Training]
B --> C[Provision GPU]
C --> D[Download Dataset]
D --> E[Train]
E --> F[Stream Metrics]
F --> G[Save Checkpoints]
G --> H[Complete]
style A fill:#2196F3,color:#fff
style B fill:#FF9800,color:#fff
style E fill:#9C27B0,color:#fff
style H fill:#4CAF50,color:#fff
Start training from the platform UI by clicking New Model on any project page (or Train from a dataset page). The training dialog has two tabs: Cloud Training and Local Training.
Choose from official YOLO26 models or your own trained models:
| Category | Description |
|---|---|
| Official | All 25 YOLO26 models (5 sizes x 5 tasks) |
| Your Models | Your completed models for fine-tuning |
Official models are organized by task type (Detect, Segment, Pose, OBB, Classify) with sizes from nano to xlarge.
Choose a dataset to train on (see Datasets):
| Option | Description |
|---|---|
| Official | Curated datasets from Ultralytics |
| Your Datasets | Datasets you've uploaded |
!!! note "Dataset Requirements"
Datasets must be in `ready` status with at least 1 image in the train split, 1 image in the validation or test split, and at least 1 labeled image.
!!! warning "Task Mismatch"
A task mismatch warning appears if the model task (e.g., detect) doesn't match the dataset task (e.g., segment). Training will fail if you proceed with mismatched tasks. Ensure both model and dataset use the same task type, as described in the [task guides](../../tasks/index.md).
Set core training parameters:
| Parameter | Description | Default |
|---|---|---|
| Epochs | Number of training iterations | 100 |
| Batch Size | Samples per iteration | -1 (auto) |
| Image Size | Input resolution (320/416/512/640/1280 dropdown, any multiple of 32 from 32-4096 in YAML editor) | 640 |
| Run Name | Optional name for the training run | auto |
Expand Advanced Settings to access the full YAML-based parameter editor with 40+ training parameters organized by group (see configuration reference):
| Group | Parameters |
|---|---|
| Learning Rate | lr0, lrf, momentum, weight_decay, warmup_epochs, warmup_momentum, warmup_bias_lr |
| Optimizer | auto (default), SGD, MuSGD, Adam, AdamW, NAdam, RAdam, RMSProp, Adamax |
| Loss Weights | box, cls, dfl, pose, kobj, label_smoothing |
| Color Augmentation | hsv_h, hsv_s, hsv_v |
| Geometric Augment. | degrees, translate, scale, shear, perspective |
| Flip & Mix Augment. | flipud, fliplr, mosaic, mixup, copy_paste |
| Training Control | patience, seed, deterministic, amp, cos_lr, close_mosaic, save_period |
| Dataset | fraction, freeze, single_cls, rect, multi_scale, resume |
Parameters are task-aware (e.g., copy_paste only shows for segment tasks, pose/kobj only for pose tasks). A Modified badge appears when values differ from defaults, and you can reset all to defaults with the reset button.
??? example "Example: Tuning Augmentation for Small Datasets"
For small datasets (<1000 images), increase augmentation to reduce overfitting:
```yaml
mosaic: 1.0 # Keep mosaic on
mixup: 0.3 # Add mixup blending
copy_paste: 0.3 # Add copy-paste (segment only)
fliplr: 0.5 # Horizontal flip
degrees: 10.0 # Slight rotation
scale: 0.9 # Aggressive scaling
```
Choose your GPU from Ultralytics Cloud:
{% include "macros/platform-gpu-table.md" %}
!!! tip "GPU Selection"
- **RTX PRO 6000**: 96 GB Blackwell, recommended default for most jobs
- **A100 SXM**: 80 GB HBM2e — strong choice for large batch sizes or bigger models
- **H100 PCIe / H100 SXM / H100 NVL**: 80–94 GB Hopper for time-sensitive training (available on all plans)
- **H200 NVL / H200 SXM**: 141–143 GB Hopper — requires [Pro or Enterprise](../account/billing.md#plans)
- **B200**: 180 GB NVIDIA Blackwell for cutting-edge workloads — requires [Pro or Enterprise](../account/billing.md#plans)
The dialog shows your current balance and a Top Up button. An estimated cost and duration are calculated based on your configuration (model size, dataset images, epochs, GPU speed).
Click Start Training to launch your job. The Platform:
Training jobs progress through the following statuses:
| Status | Description |
|---|---|
| Pending | Job submitted, waiting for GPU allocation |
| Starting | GPU provisioned, downloading dataset and model |
| Running | Training in progress, metrics streaming in real-time |
| Completed | Training finished successfully |
| Failed | Training failed (see console logs for details) |
| Cancelled | Training was cancelled by the user |
!!! success "Free Credits"
New accounts receive signup credits — $5 for personal emails and $25 for company emails. [Check your balance](../account/billing.md) in Settings > Billing.
View real-time training progress on the model page's Train tab:
| Metric | Description |
|---|---|
| Loss | Training and validation loss |
| mAP | Mean Average Precision |
| Precision | Correct positive predictions |
| Recall | Detected ground truths |
Live console output with ANSI color support, progress bars, and error detection.
Real-time GPU utilization, memory, temperature, CPU, and disk usage.
After training completes, the best model (best.pt, the highest-mAP checkpoint) is uploaded to the platform and made available for download, export, and deployment.
Click Cancel Training on the model page to stop a running job:
graph LR
A[Local GPU] --> B[Train]
B --> C[ultralytics Package]
C --> D[Stream Metrics]
D --> E[Platform Dashboard]
style A fill:#FF9800,color:#fff
style C fill:#2196F3,color:#fff
style E fill:#4CAF50,color:#fff
Train on your own hardware while streaming metrics to the platform.
!!! warning "Package Version Requirement"
Platform integration requires **ultralytics>=8.4.35**. Lower versions will NOT work with Platform.
```bash
pip install -U ultralytics
```
Settings > API Keysexport ULTRALYTICS_API_KEY="YOUR_API_KEY"
Use the project and name parameters to stream metrics:
=== "CLI"
```bash
yolo train model=yolo26n.pt data=coco.yaml epochs=100 \
project=username/my-project name=experiment-1
```
=== "Python"
```python
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
model.train(
data="coco.yaml",
epochs=100,
project="username/my-project",
name="experiment-1",
)
```
The Local Training tab in the training dialog shows a pre-configured command with your API key, selected parameters, and advanced arguments included.
Train with datasets stored on the platform using the ul:// URI format:
=== "CLI"
```bash
yolo train model=yolo26n.pt data=ul://username/datasets/my-dataset epochs=100 \
project=username/my-project name=exp1
```
=== "Python"
```python
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
model.train(
data="ul://username/datasets/my-dataset",
epochs=100,
project="username/my-project",
name="exp1",
)
```
The ul:// URI format automatically downloads and configures your dataset. The model is automatically linked to the dataset on the platform (see Using Platform Datasets).
Training costs are based on GPU usage:
Before training starts, the platform estimates total cost by:
Factors affecting cost:
| Factor | Impact |
|---|---|
| Dataset Size | More images = longer training time (baseline: ~2.8s compute per 1000 images on RTX 4090) |
| Model Size | Larger models (m, l, x) train slower than (n, s) |
| Number of Epochs | Direct multiplier on training time |
| Image Size | Larger imgsz increases computation: 320px=0.25x, 640px=1.0x (baseline), 1280px=4.0x |
| Batch Size | Larger batches are more efficient (batch 32 = ~0.85x time, batch 8 = ~1.2x time vs batch 16 baseline) |
| GPU Speed | Faster GPUs reduce training time (e.g., H100 SXM = ~3.4x faster than RTX 4090) |
| Startup Overhead | Up to 5 minutes for instance initialization, data download, and warmup (scales with dataset size) |
!!! note "Estimates"
Cost estimates are approximate and depend on many factors. The training dialog shows a real-time estimate before you start training.
| Scenario | GPU | Estimated Cost |
|---|---|---|
| 500 images, YOLO26n, 50 epochs | RTX 4090 | ~$0.50 |
| 1000 images, YOLO26n, 100 epochs | RTX PRO 6000 | ~$5 |
| 5000 images, YOLO26s, 100 epochs | H100 SXM | ~$23 |
graph LR
A[Estimate Cost] --> B[Balance Check]
B --> C[Train]
C --> D[Charge Actual Runtime]
style A fill:#2196F3,color:#fff
style B fill:#FF9800,color:#fff
style C fill:#9C27B0,color:#fff
style D fill:#4CAF50,color:#fff
Cloud training billing flow:
!!! success "Consumer Protection"
Billing tracks actual compute usage, including partial runs that are cancelled. **You are never charged for failed training runs.**
| Status | Charged? |
|---|---|
| Completed | Yes — actual GPU time used |
| Cancelled | Yes — GPU time from start to cancellation |
| Failed | No — failed runs are not charged |
| Stuck | Partial — only actual training time charged |
!!! tip "No Charge for Errors"
If a training run fails due to a configuration error, out-of-memory issue, or any other failure, you are **not charged**. Only successful compute time is billed. Stuck jobs (no activity for 4+ hours) are automatically terminated and charged only for the time the GPU was actively training, not the idle time.
| Method | Description |
|---|---|
| Account Balance | Pre-loaded credits |
| Pay Per Job | Charge at job completion |
!!! note "Minimum Balance"
Training start requires a positive available balance and enough credits for the estimated job cost.
After training, view detailed costs in the Billing tab:
| Model | Parameters | Best For |
|---|---|---|
| YOLO26n | 2.4M | Real-time, edge devices |
| YOLO26s | 9.5M | Balanced speed/accuracy |
| YOLO26m | 20.4M | Higher accuracy |
| YOLO26l | 24.8M | Production accuracy |
| YOLO26x | 55.7M | Maximum accuracy |
!!! tip "Cost-Saving Strategies"
1. **Start small**: Test with 10-20 epochs on a budget GPU to verify your dataset and config work
2. **Use appropriate GPU**: RTX PRO 6000 handles most workloads well
3. **Validate dataset**: Fix labeling issues before spending on training
4. **Monitor early**: Cancel training if loss plateaus — you only pay for compute time used
| Issue | Solution |
|---|---|
| Training stuck at 0% | Check dataset format, retry |
| Out of memory | Reduce batch size or use larger GPU |
| Poor accuracy | Increase epochs, check data quality |
| Training slow | Consider faster GPU |
| Task mismatch error | Ensure model and dataset tasks match |
Training time depends on:
Typical times (1000 images, 100 epochs):
| Model | RTX PRO 6000 | A100 SXM |
|---|---|---|
| YOLO26n | ~20 min | ~15 min |
| YOLO26m | ~40 min | ~30 min |
| YOLO26x | ~80 min | ~60 min |
!!! note "Approximate Times"
Training times are approximate and vary with dataset complexity, augmentation settings, and batch size. Use the training dialog's cost estimate for more accurate predictions.
Yes, training continues until completion. You'll receive a notification when training finishes. Make sure your account has sufficient balance for epoch-based training.
If your credit balance reaches zero during a training run, training continues to completion and your balance goes negative. This ensures your training job is never interrupted mid-run.
After training completes, you'll need to add credits to bring your balance back to positive before starting new training jobs. Your completed model, checkpoints, and all training artifacts are fully preserved regardless of balance.
!!! note "Negative Balance"
A negative balance only prevents starting **new** training jobs. Existing deployments and other platform features continue to work normally. Add credits via [Settings > Billing](../account/billing.md) or enable [auto top-up](../account/billing.md#auto-top-up) to avoid interruptions.
Cost estimates are approximate — actual training time may vary due to factors like data loading speed, GPU warmup, and model convergence behavior. If the actual cost exceeds the estimate, your balance may go negative (see above). The platform does not stop training based on the estimate.
To manage costs:
Yes, expand the Advanced Settings section in the training dialog to access a YAML editor with 40+ configurable parameters. Non-default values are included in both cloud and local training commands.
The YAML editor also supports importing configurations from previous training runs:
.yaml or .json file directly into the editor to import its parameters.This makes it easy to reproduce or iterate on previous training configurations without manually re-entering each parameter.
Yes, the Train button on dataset pages opens the training dialog with the dataset preselected and locked. You then select a project and model to begin training.
=== "Core"
| Parameter | Type | Default | Range | Description |
| -------------- | ---- | --------- | --------- | ------------------------------------------------ |
| `epochs` | int | 100 | 1-10000 | Number of training epochs |
| `batch` | int | -1 (auto) | -1 to 512 | Batch size (`-1` = auto-fit to available VRAM) |
| `imgsz` | int | 640 | 32-4096 | Input image size |
| `patience` | int | 100 | 1-1000 | Early stopping patience |
| `seed` | int | 0 | 0-2147483647 | Random seed for reproducibility |
| `deterministic`| bool | True | - | Deterministic training mode |
| `amp` | bool | True | - | Automatic mixed precision |
| `close_mosaic` | int | 10 | 0-50 | Disable mosaic in final N epochs |
| `save_period` | int | -1 | -1-100 | Save checkpoint every N epochs |
| `workers` | int | 8 | 0-64 | Dataloader workers |
| `cache` | select | false | ram/disk/false | Cache images |
=== "Learning Rate"
| Parameter | Type | Default | Range | Description |
| --------------- | ----- | ------- | --------- | --------------------- |
| `lr0` | float | 0.01 | 0.0001-0.1 | Initial learning rate |
| `lrf` | float | 0.01 | 0.01-1.0 | Final LR factor |
| `momentum` | float | 0.937 | 0.6-0.98 | SGD momentum |
| `weight_decay` | float | 0.0005 | 0.0-0.001 | L2 regularization |
| `warmup_epochs` | float | 3.0 | 0-5 | Warmup epochs |
| `warmup_momentum` | float | 0.8 | 0.5-0.95 | Warmup momentum |
| `warmup_bias_lr` | float | 0.1 | 0.0-0.2 | Warmup bias LR |
| `cos_lr` | bool | False | - | Cosine LR scheduler |
=== "Augmentation"
| Parameter | Type | Default | Range | Description |
| ------------ | ----- | ------- | ------- | -------------------- |
| `hsv_h` | float | 0.015 | 0.0-0.1 | HSV hue augmentation |
| `hsv_s` | float | 0.7 | 0.0-1.0 | HSV saturation |
| `hsv_v` | float | 0.4 | 0.0-1.0 | HSV value |
| `degrees` | float | 0.0 | -45-45 | Rotation degrees |
| `translate` | float | 0.1 | 0.0-1.0 | Translation fraction |
| `scale` | float | 0.5 | 0.0-1.0 | Scale factor |
| `shear` | float | 0.0 | -10-10 | Shear degrees |
| `perspective`| float | 0.0 | 0.0-0.001 | Perspective transform|
| `fliplr` | float | 0.5 | 0.0-1.0 | Horizontal flip prob |
| `flipud` | float | 0.0 | 0.0-1.0 | Vertical flip prob |
| `mosaic` | float | 1.0 | 0.0-1.0 | Mosaic augmentation |
| `mixup` | float | 0.0 | 0.0-1.0 | Mixup augmentation |
| `copy_paste` | float | 0.0 | 0.0-1.0 | Copy-paste (segment) |
=== "Dataset"
| Parameter | Type | Default | Range | Description |
| ------------- | ----- | ------- | ------- | ------------------------------------ |
| `fraction` | float | 1.0 | 0.1-1.0 | Fraction of dataset to use |
| `freeze` | int | null | 0-100 | Number of layers to freeze |
| `single_cls` | bool | False | - | Treat all classes as one class |
| `rect` | bool | False | - | Rectangular training |
| `multi_scale` | float | 0.0 | 0.0-1.0 | Multi-scale training range |
| `val` | bool | True | - | Run validation during training |
| `resume` | bool | False | - | Resume training from checkpoint |
=== "Optimizer"
| Value | Description |
| --------- | ----------------------------- |
| `auto` | Automatic selection (default) |
| `SGD` | Stochastic Gradient Descent |
| `MuSGD` | Muon SGD optimizer |
| `Adam` | Adam optimizer |
| `AdamW` | Adam with weight decay |
| `NAdam` | NAdam optimizer |
| `RAdam` | RAdam optimizer |
| `RMSProp` | RMSProp optimizer |
| `Adamax` | Adamax optimizer |
=== "Loss Weights"
| Parameter | Type | Default | Range | Description |
| ---------------- | ----- | ------- | --------- | --------------------------- |
| `box` | float | 7.5 | 1-50 | Box loss weight |
| `cls` | float | 0.5 | 0.2-4 | Classification loss weight |
| `dfl` | float | 1.5 | 0.4-6 | Distribution focal loss |
| `pose` | float | 12.0 | 1-50 | Pose loss weight (pose only)|
| `kobj` | float | 1.0 | 0.5-10 | Keypoint objectness (pose) |
| `label_smoothing`| float | 0.0 | 0.0-0.1 | Label smoothing factor |
!!! tip "Task-Specific Parameters"
Some parameters only apply to specific tasks:
- **Detection tasks only** (detect, segment, pose, OBB — not classify): `box`, `dfl`, `degrees`, `translate`, `shear`, `perspective`, `mosaic`, `mixup`, `close_mosaic`
- **Segment only**: `copy_paste`
- **Pose only**: `pose` (loss weight), `kobj` (keypoint objectness)