Back to Rerun

The Data Layer for Physical AI

docs/content/overview/what-is-rerun.md

0.33.05.6 KB
Original Source

Rerun covers the whole journey from raw recordings to training, on a single unified data layer for multi-rate, multimodal robotics data.

It's comprised of Rerun SDK: an open source library and tools for logging, storing, querying, visualizing, and training on multi-rate, multimodal data; and Rerun Hub: a data catalog and backend for large scale storage, access, and streaming of robotics data from object storage.

The problem

Building intelligent physical systems requires rapid iteration on both data and models. But teams often get stuck because:

  • Data from sensors arrives at different rates and in different formats
  • Understanding what went wrong requires visualizing multimodal data (images, point clouds, sensor readings) together in time
  • Extracting, cleaning, and preparing data for training involves too many manual steps
  • Switching between different tools for each step slows everything down

The best robotics teams minimize their time from new data to training. Rerun gives you the unified infrastructure to make that happen.

Who is Rerun for?

Rerun is built for teams developing intelligent physical systems:

  • Robotics engineers debugging perception, controls, and planning
  • Perception teams analyzing sensor data and model outputs
  • ML engineers preparing datasets and understanding model behavior
  • Autonomy teams developing and testing decision-making systems

If you're working with robots, drones, autonomous vehicles, spatial AI, or any system with data that evolves over time, Rerun helps you move faster.

How do you use it?

Log and ingest

Use the logging API to log multimodal data from your code, or the chunk processing API to convert your existing data to the .rrd file format to later visualize or query.

<div class="d2-diagram"> </div>

Visualize

Rerun provides an open source pre-built viewer that is adjustable and extensible. You can log directly to the viewer, open a range of file formats to get data into the viewer, or even connect the viewer to a Rerun catalog.

<div class="d2-diagram"> </div>

Query and transform

The Rerun file format supports both high performance visualization and querying over the same data source.

You can use the open source catalog server for running local laptop scale examples. We also offer Rerun Hub, a scalable catalog for robotic data, for teams that need collaborative dataset management, version control, and cloud storage (reach out to learn more). These are API compatible so the only difference from our examples to Rerun Hub is that you connect to an existing server instead of launching your own.

Prepare catalog

Before querying or viewing recordings on the catalog we have to register them. We group recordings as datasets. Since Rerun indexes existing data in place, registration needs paths to RRDs to index: in object store for Rerun Hub or on disk for local catalog server.

<div class="d2-diagram"> </div>

Use catalog

At this point a viewer can connect to the prepared catalog or we show the basic steps to perform a query. We specify what dataset we want to query, get access to a lazy loaded dataframe, specify our query, and retrieve the results. Queries can be specified with SQL or dataframe APIs allowing the flexibility to investigate anything about your data.

<div class="d2-diagram"> </div>

Train

Use the catalog as a data source for training: a dataloader runs a query against the catalog and yields training batches.

d2
direction: right

query: "query"

dataloader: "dataloader"

catalog: "Rerun Catalog" {
  shape: cylinder
  height: 130
}

batches: "batches"

query -> dataloader
dataloader -> catalog
batches <- catalog

Get started

Ready to speed up your iteration cycle?

Can't find what you're looking for?