The Data Layer for Physical AI - Rerun

Rerun covers the whole journey from raw recordings to training, on a single unified data layer for multi-rate, multimodal robotics data.

It's comprised of Rerun SDK: an open source library and tools for logging, storing, querying, visualizing, and training on multi-rate, multimodal data; and Rerun Hub: a data catalog and backend for large scale storage, access, and streaming of robotics data from object storage.

The problem

Building intelligent physical systems requires rapid iteration on both data and models. But teams often get stuck because:

Data from sensors arrives at different rates and in different formats
Understanding what went wrong requires visualizing multimodal data (images, point clouds, sensor readings) together in time
Extracting, cleaning, and preparing data for training involves too many manual steps
Switching between different tools for each step slows everything down

The best robotics teams minimize their time from new data to training. Rerun gives you the unified infrastructure to make that happen.

Who is Rerun for?

Rerun is built for teams developing intelligent physical systems:

Robotics engineers debugging perception, controls, and planning
Perception teams analyzing sensor data and model outputs
ML engineers preparing datasets and understanding model behavior
Autonomy teams developing and testing decision-making systems

If you're working with robots, drones, autonomous vehicles, spatial AI, or any system with data that evolves over time, Rerun helps you move faster.

How do you use it?

Log and ingest

Use the logging API to log multimodal data from your code, or the chunk processing API to convert your existing data to the .rrd file format to later visualize or query.

Visualize

Rerun provides an open source pre-built viewer that is adjustable and extensible. You can log directly to the viewer, open a range of file formats to get data into the viewer, or even connect the viewer to a Rerun catalog.

Query and transform

The Rerun file format supports both high performance visualization and querying over the same data source.

You can use the open source catalog server for running local laptop scale examples. We also offer Rerun Hub, a scalable catalog for robotic data, for teams that need collaborative dataset management, version control, and cloud storage (reach out to learn more). These are API compatible so the only difference from our examples to Rerun Hub is that you connect to an existing server instead of launching your own.

Prepare catalog

Before querying or viewing recordings on the catalog we have to register them. We group recordings as datasets. Since Rerun indexes existing data in place, registration needs paths to RRDs to index: in object store for Rerun Hub or on disk for local catalog server.

Use catalog

At this point a viewer can connect to the prepared catalog or we show the basic steps to perform a query. We specify what dataset we want to query, get access to a lazy loaded dataframe, specify our query, and retrieve the results. Queries can be specified with SQL or dataframe APIs allowing the flexibility to investigate anything about your data.

Train

Use the catalog as a data source for training: a dataloader runs a query against the catalog and yields training batches.

direction: right

query: "query"

dataloader: "dataloader"

catalog: "Rerun Catalog" {
  shape: cylinder
  height: 130
}

batches: "batches"

query -> dataloader
dataloader -> catalog
batches <- catalog

Get started

Ready to speed up your iteration cycle?

Quick start guide - Get up and running in minutes
Examples - See Rerun in action with real data
Concepts - Learn how Rerun works under the hood

Can't find what you're looking for?

Join us in the Rerun Community Discord
Submit an issue in the Rerun GitHub project