docs/content/howto/query-and-transform/lerobot_export.md
Convert Rerun data into the training dataset version of your choice. This guide demonstrates how to use the OSS Rerun server to query recordings, align multi-rate sensor data to a common timeline, and export the result as a LeRobot dataset.
This example requires the rerun_export package from the Rerun repository:
pip install -e examples/python/rerun_export
This will install the necessary dependencies including LeRobot, DataFusion, and PyArrow.
By default, the export uses the frame rate specified in the config to create evenly spaced samples (a LeRobot requirement).
For more details on time alignment, see Time-align data.
Start a local server and load your recordings. Each recording becomes a segment in the dataset, and each unique segment id becomes one LeRobot episode.
snippet: howto/lerobot_export[setup]
See Catalog object model for how recordings are represented on the Data Platform.
Robot recordings often contain more data than needed for training. Filter the dataset to include only the relevant entity paths and components that will map to LeRobot’s standardized format.
For example, you might include joint position commands as actions, joint states and end-effector pose as observations, RGB camera streams as video inputs, and a language instruction as the task description. Other signals such as debug visualizations, intermediate computations, or unused sensors can be excluded.
snippet: howto/lerobot_export[filter_data]
Define how to map the data to LeRobot's standardized format. This requires specifying:
snippet: howto/lerobot_export[configure_export]
LeRobot uses a schema called "features" to describe dataset structure. The infer_features function automatically creates this schema by inspecting your data.
snippet: howto/lerobot_export[infer_features]
Feature inference examines the underlying data to determine:
Create the LeRobot dataset instance, using the LeRobot dataset API:
snippet: howto/lerobot_export[create_dataset]
Note, root is where the dataset files will be written, and LeRobot requires this to be an empty or non-existing directory.
Convert the filtered data into a LeRobot episode. This is the core transformation step.
snippet: howto/lerobot_export[export_episode]
The convert_dataframe_to_episode function performs time alignment and resamples the dataframe to the target frame rate. It generates a sequence of evenly spaced timestamps at the target frame rate and treats these as the canonical timesteps for the episode.
For each timestep, it queries the most recent available value of every selected component using Rerun’s latest-at semantics. If a stream has no sample exactly at that time, its last observed value is forward-filled.
The finalize() call completes the dataset by writing metadata and closing all files.
To export multiple recordings as separate episodes, iterate over segment IDs:
snippet: howto/lerobot_export[multi_episode_export]
The exported LeRobot dataset can be used directly with LeRobot's training scripts:
snippet: howto/lerobot_export[use_dataset]
Or push it to the Hugging Face Hub for sharing:
dataset.push_to_hub(repo_id="your-username/your-dataset-name")
The rerun_export package includes a CLI that implements this workflow for batch processing:
rerun_export \
--rrd-dir ./tests/assets/rrd/sample_5 \
--output ./lerobot_dataset \
--dataset-name rerun-example-droid \
--fps 15 \
--action /action/joint_positions:Scalars:scalars \
--state /observation/joint_positions:Scalars:scalars \
--task /language_instruction:TextDocument:text \
--video ext1:/camera/ext1 \
--video ext2:/camera/ext2 \
--video wrist:/camera/wrist
See the rerun_export example for the complete implementation.