docs/en/platform/data/index.md
Data preparation is the foundation of successful computer vision models. Ultralytics Platform provides comprehensive tools for managing your training data, from upload through annotation to analysis.
<p align="center"> <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/kA09zsjZGdA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen> </iframe><strong>Watch:</strong> Get Started with Ultralytics Platform - Data
</p>The Data section of Ultralytics Platform helps you:
.tar.gz/.tgz, NDJSON)graph LR
A[Upload] --> B[Annotate]
B --> C[Analyze]
C --> D[Train]
style A fill:#4CAF50,color:#fff
style B fill:#2196F3,color:#fff
style C fill:#FF9800,color:#fff
style D fill:#9C27B0,color:#fff
| Stage | Description |
|---|---|
| Upload | Import images, videos, or archives with automatic processing |
| Annotate | Label data with manual tools for all 5 task types, or use SAM annotation for detect, segment, and OBB |
| Analyze | View class distributions, spatial heatmaps, and dimension statistics |
| Export | Download in NDJSON format for offline use |
Ultralytics Platform supports all 5 YOLO task types:
| Task | Description | Annotation Tool |
|---|---|---|
| Detect | Object detection with bounding boxes | Rectangle tool |
| Segment | Instance segmentation with pixel masks | Polygon tool |
| Pose | Keypoint estimation with built-in and custom skeleton templates | Keypoint tool |
| OBB | Oriented bounding boxes for rotated objects | Oriented box tool |
| Classify | Image-level classification | Class selector |
!!! info "Task Type Selection"
The task type is set when creating a dataset and determines which annotation tools are available. You can change it later from the dataset header task selector, but incompatible annotations won't be displayed after switching.
Ultralytics Platform uses Content-Addressable Storage (CAS) for efficient data management:
Reference datasets using the ul:// URI format (see Using Platform Datasets):
yolo train data=ul://username/datasets/my-dataset
This allows training on the platform's datasets from any machine with your API key configured.
!!! example "Use Platform Data from Python"
```python
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
model.train(data="ul://username/datasets/my-dataset", epochs=100)
```
Create immutable NDJSON snapshots of your dataset for reproducible training. Each version captures image counts, class counts, and annotation counts at the time of creation. See Versions Tab for details.
Dataset pages can show up to six tabs, depending on the dataset state and your permissions:
| Tab | Description |
|---|---|
| Images | Browse images in grid, compact, or table view with annotation overlays |
| Classes | View and edit class names, colors, and label counts per class |
| Charts | Automatic statistics: split distribution, class counts, heatmaps |
| Models | Models trained on this dataset with metrics and status |
| Versions | Create and download immutable NDJSON snapshots for reproducible training |
| Errors | Images that failed processing with error details and fix guidance |
Classes and Charts appear when the dataset has images. Errors appears only when processing failures exist. Versions appears for owners, or for non-owners when versions already exist.
Explore your dataset as an interactive 2D scatter plot where visually similar images sit close together — useful for surfacing clusters, duplicates, and outliers, and for inspecting how splits or classes are distributed across your data. Lasso a region of the plot to filter the gallery to those images. See Clustering for details.
The Charts tab provides automatic analysis including:
ul:// URIs to train from anywhereUltralytics Platform supports:
Images: JPEG, PNG, WebP, BMP, TIFF, HEIC, AVIF, JP2, DNG, MPO (max 50MB each)
Videos: MP4, WebM, MOV, AVI, MKV, M4V (max 1GB, frames extracted at 1 FPS, max 100 frames)
Dataset files: ZIP or TAR archives including .tar.gz and .tgz (max 10GB on Free, 20GB on Pro, 50GB on Enterprise) containing images with optional YOLO-format labels, plus NDJSON exports
Storage limits depend on your plan:
| Plan | Storage Limit |
|---|---|
| Free | 100 GB |
| Pro | 500 GB |
| Enterprise | Unlimited |
Individual file limits: Images 50MB, Videos 1GB, datasets 10GB on Free / 20GB on Pro / 50GB on Enterprise
Yes! Use the dataset URI format to train locally:
=== "CLI"
```bash
export ULTRALYTICS_API_KEY="YOUR_API_KEY"
yolo train model=yolo26n.pt data=ul://username/datasets/my-dataset epochs=100
```
=== "Python"
```python
import os
os.environ["ULTRALYTICS_API_KEY"] = "YOUR_API_KEY"
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
model.train(data="ul://username/datasets/my-dataset", epochs=100)
```
Or export your dataset in NDJSON format for fully offline training.