Back to Ultralytics

Roboflow 100 Dataset

docs/en/datasets/detect/roboflow-100.md

8.4.7813.1 KB
Original Source

Roboflow 100 Dataset

Roboflow 100, sponsored by Intel, is a groundbreaking object detection benchmark dataset. It includes 100 diverse datasets. This benchmark is specifically designed to test the adaptability of computer vision models, like Ultralytics YOLO models, to various domains, including healthcare, aerial imagery, and video games.

!!! question "Licensing"

Ultralytics offers two licensing options to accommodate different use cases:

- **AGPL-3.0 License**: This [OSI-approved](https://opensource.org/license) open-source license is ideal for students and enthusiasts, promoting open collaboration and knowledge sharing. See the [LICENSE](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) file for more details and visit our [AGPL-3.0 License page](https://www.ultralytics.com/legal/agpl-3-0-software-license).
- **Enterprise License**: For development and production use, this license enables seamless integration of Ultralytics software and AI models into business products and services, including internal tools, automated workflows, and production deployments, bypassing the open-source requirements of AGPL-3.0. To get started, please contact us via [Ultralytics Licensing](https://www.ultralytics.com/license).
<p align="center"> </p>

Key Features

  • Diverse Domains: Includes 100 datasets across seven distinct domains: Aerial, Video games, Microscopic, Underwater, Documents, Electromagnetic, and Real World.
  • Scale: The benchmark comprises 224,714 images across 805 classes, representing over 11,170 hours of data labeling effort.
  • Standardization: All images are preprocessed and resized to 640x640 pixels for consistent evaluation.
  • Clean Evaluation: Focuses on eliminating class ambiguity and filters out underrepresented classes to ensure cleaner model evaluation.
  • Annotations: Includes bounding boxes for objects, suitable for training and evaluating object detection models using metrics like mAP.

Dataset Structure

The Roboflow 100 dataset is organized into seven categories, each containing a unique collection of datasets, images, and classes:

  • Aerial: 7 datasets, 9,683 images, 24 classes.
  • Video Games: 7 datasets, 11,579 images, 88 classes.
  • Microscopic: 11 datasets, 13,378 images, 28 classes.
  • Underwater: 5 datasets, 18,003 images, 39 classes.
  • Documents: 8 datasets, 24,813 images, 90 classes.
  • Electromagnetic: 12 datasets, 36,381 images, 41 classes.
  • Real World: 50 datasets, 110,615 images, 495 classes.

This structure provides a diverse and extensive testing ground for object detection models, reflecting a wide array of real-world application scenarios found in various Ultralytics Solutions.

Benchmarking

Dataset benchmarking involves evaluating the performance of machine learning models on specific datasets using standardized metrics. Common metrics include accuracy, mean Average Precision (mAP), and F1-score. You can learn more about these in our YOLO Performance Metrics guide.

!!! tip "Benchmarking Results"

Every output is grouped under a single `runs/<task>/multitrain/` directory: each dataset is fine-tuned in its own subdirectory (with its own `results.png`), and the per-dataset and mean metrics are written to `multitrain_results.json` alongside a `multitrain_results.png` bar chart. The `model.train()` call also returns a `{dataset: metrics}` dictionary for programmatic access.

!!! example "Benchmarking Example"

The script below downloads the Roboflow 100 datasets listed in `datasets_links.txt` from Roboflow, then fine-tunes a single base model (e.g., YOLO26n) across the whole collection in one `model.train()` call. Passing a list of datasets fine-tunes the base model on each one in series and automatically visualizes the cross-dataset results. A free [Roboflow API key](https://docs.roboflow.com/api-reference/authentication) is required to download the datasets.

=== "Python"

    ```python
    import re
    from pathlib import Path

    from ultralytics import YOLO
    from ultralytics.utils import ASSETS_URL, YAML
    from ultralytics.utils.checks import check_requirements
    from ultralytics.utils.downloads import safe_download

    # Download the RF100 datasets from Roboflow (requires a free Roboflow API key)
    check_requirements("roboflow")
    from roboflow import Roboflow

    rf = Roboflow(api_key="YOUR_ROBOFLOW_API_KEY")
    safe_download(f"{ASSETS_URL}/datasets_links.txt")  # list of RF100 dataset links

    datasets = []
    for line in Path("datasets_links.txt").read_text().splitlines():
        try:
            _, _url, workspace, project, version = re.split("/+", line.strip())
            location = f"rf-100/{project}-{version}"
            rf.workspace(workspace).project(project).version(version).download("yolov8", location=location)
            yaml = Path(location) / "data.yaml"
            cfg = YAML.load(yaml)  # point train/val at the downloaded image folders
            cfg["train"], cfg["val"] = "train/images", "valid/images"
            YAML.save(yaml, cfg)
            datasets.append(str(yaml))
        except Exception:
            continue

    # Fine-tune one base model across all RF100 datasets and visualize the cross-dataset results
    model = YOLO("yolo26n.pt")
    results = model.train(data=datasets, epochs=100, imgsz=640)  # {dataset: metrics}

    # Per-dataset runs, multitrain_results.json (per-dataset + mean), and multitrain_results.png are saved
    # together under runs/detect/multitrain. Read results in-memory or from the JSON for custom post-processing.
    for dataset, metrics in results.items():
        if metrics:  # None if that dataset failed to train
            print(f"{dataset}: mAP50-95 = {metrics['metrics/mAP50-95(B)']:.4f}")
    ```

Applications

Roboflow 100 is invaluable for various applications related to computer vision and deep learning. Researchers and engineers can leverage this benchmark to:

  • Evaluate the performance of object detection models in a multi-domain context.
  • Test the adaptability and robustness of models to real-world scenarios beyond common benchmark datasets like COCO or PASCAL VOC.
  • Benchmark the capabilities of object detection models across diverse datasets, including specialized areas like healthcare, aerial imagery, and video games.
  • Compare model performance across different neural network architectures and optimization techniques.
  • Identify domain-specific challenges that may require specialized model training tips or fine-tuning approaches like transfer learning.

For more ideas and inspiration on real-world applications, explore our guides on practical projects or check out Ultralytics Platform for streamlined model training and deployment.

Usage

The Roboflow 100 dataset, including metadata and download links, is available on the official Roboflow 100 GitHub repository. You can access and utilize the dataset directly from there for your benchmarking needs. Once the datasets are downloaded and prepared as shown above, Ultralytics models can be fine-tuned across the entire collection in a single model.train() call by passing the list of dataset YAMLs.

Sample Data and Annotations

Roboflow 100 consists of datasets with diverse images captured from various angles and domains. Below are examples of annotated images included in the RF100 benchmark, showcasing the variety of objects and scenes. Techniques like data augmentation can further enhance the diversity during training.

<p align="center"> </p>

The diversity seen in the Roboflow 100 benchmark represents a significant advancement from traditional benchmarks, which often focus on optimizing a single metric within a limited domain. This comprehensive approach aids in developing more robust and versatile computer vision models capable of performing well across a multitude of different scenarios.

Citations and Acknowledgments

If you use the Roboflow 100 dataset in your research or development work, please cite the original paper:

!!! quote ""

=== "BibTeX"

    ```bibtex
    @misc{rf100benchmark,
        Author = {Floriana Ciaglia and Francesco Saverio Zuppichini and Paul Guerrie and Mark McQuade and Jacob Solawetz},
        Title = {Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
        Year = {2022},
        Eprint = {arXiv:2211.13523},
        url = {https://arxiv.org/abs/2211.13523}
    }
    ```

We extend our gratitude to the Roboflow team and all contributors for their significant efforts in creating and maintaining the Roboflow 100 dataset as a valuable resource for the computer vision community.

If you are interested in exploring more datasets to enhance your object detection and machine learning projects, feel free to visit our comprehensive dataset collection, which includes a variety of other detection datasets.

FAQ

What is the Roboflow 100 dataset, and why is it significant for object detection?

The Roboflow 100 dataset is a benchmark for object detection models. It comprises 100 diverse datasets covering domains like healthcare, aerial imagery, and video games. Its significance lies in providing a standardized way to test model adaptability and robustness across a wide range of real-world scenarios, moving beyond traditional, often domain-limited, benchmarks.

Which domains are covered by the Roboflow 100 dataset?

The Roboflow 100 dataset spans seven diverse domains, offering unique challenges for object detection models:

  1. Aerial: 7 datasets (e.g., satellite imagery, drone views).
  2. Video Games: 7 datasets (e.g., objects from various game environments).
  3. Microscopic: 11 datasets (e.g., cells, particles).
  4. Underwater: 5 datasets (e.g., marine life, submerged objects).
  5. Documents: 8 datasets (e.g., text regions, form elements).
  6. Electromagnetic: 12 datasets (e.g., radar signatures, spectral data visualizations).
  7. Real World: 50 datasets (a broad category including everyday objects, scenes, retail, etc.).

This variety makes RF100 an excellent resource for assessing the generalizability of computer vision models.

What should I include when citing the Roboflow 100 dataset in my research?

When using the Roboflow 100 dataset, please cite the original paper to give credit to the creators. Here is the recommended BibTeX citation:

!!! quote ""

=== "BibTeX"

    ```bibtex
    @misc{rf100benchmark,
        Author = {Floriana Ciaglia and Francesco Saverio Zuppichini and Paul Guerrie and Mark McQuade and Jacob Solawetz},
        Title = {Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
        Year = {2022},
        Eprint = {arXiv:2211.13523},
        url = {https://arxiv.org/abs/2211.13523}
    }
    ```

For further exploration, consider visiting our comprehensive dataset collection or browsing other detection datasets compatible with Ultralytics models.