π₀ (Pi0)

π₀ is a Vision-Language-Action model for general robot control, from Physical Intelligence. The LeRobot implementation is adapted from their open source OpenPI repository.

Model Overview

π₀ represents a breakthrough in robotics as the first general-purpose robot foundation model developed by Physical Intelligence. Unlike traditional robot programs that are narrow specialists programmed for repetitive motions, π₀ is designed to be a generalist policy that can understand visual inputs, interpret natural language instructions, and control a variety of different robots across diverse tasks.

The Vision for Physical Intelligence

As described by Physical Intelligence, while AI has achieved remarkable success in digital domains, from chess-playing to drug discovery, human intelligence still dramatically outpaces AI in the physical world. To paraphrase Moravec's paradox, winning a game of chess represents an "easy" problem for AI, but folding a shirt or cleaning up a table requires solving some of the most difficult engineering problems ever conceived. π₀ represents a first step toward developing artificial physical intelligence that enables users to simply ask robots to perform any task they want, just like they can with large language models.

Architecture and Approach

π₀ combines several key innovations:

Flow Matching: Uses a novel method to augment pre-trained VLMs with continuous action outputs via flow matching (a variant of diffusion models)
Cross-Embodiment Training: Trained on data from 8 distinct robot platforms including UR5e, Bimanual UR5e, Franka, Bimanual Trossen, Bimanual ARX, Mobile Trossen, and Mobile Fibocom
Internet-Scale Pre-training: Inherits semantic knowledge from a pre-trained 3B parameter Vision-Language Model
High-Frequency Control: Outputs motor commands at up to 50 Hz for real-time dexterous manipulation

Installation Requirements

Install LeRobot by following our Installation Guide.
Install Pi0 dependencies by running:
bash
```
pip install -e ".[pi]"
```

Training Data and Capabilities

π₀ is trained on the largest robot interaction dataset to date, combining three key data sources:

Internet-Scale Pre-training: Vision-language data from the web for semantic understanding
Open X-Embodiment Dataset: Open-source robot manipulation datasets
Physical Intelligence Dataset: Large and diverse dataset of dexterous tasks across 8 distinct robots

Usage

To use π₀ in LeRobot, specify the policy type as:

python

policy.type=pi0

Training

For training π₀, you can use the standard LeRobot training script with the appropriate configuration:

bash

lerobot-train \
    --dataset.repo_id=your_dataset \
    --policy.type=pi0 \
    --output_dir=./outputs/pi0_training \
    --job_name=pi0_training \
    --policy.pretrained_path=lerobot/pi0_base \
    --policy.repo_id=your_repo_id \
    --policy.compile_model=true \
    --policy.gradient_checkpointing=true \
    --policy.dtype=bfloat16 \
    --policy.freeze_vision_encoder=false \
    --policy.train_expert_only=false \
    --steps=3000 \
    --policy.device=cuda \
    --batch_size=32

Key Training Parameters

--policy.compile_model=true: Enables model compilation for faster training
--policy.gradient_checkpointing=true: Reduces memory usage significantly during training
--policy.dtype=bfloat16: Use mixed precision training for efficiency
--batch_size=32: Batch size for training, adapt this based on your GPU memory
--policy.pretrained_path=lerobot/pi0_base: The base π₀ model you want to finetune, options are:
- lerobot/pi0_base
- lerobot/pi0_libero (specifically trained on the Libero dataset)

Training Parameters Explained

Parameter	Default	Description
`freeze_vision_encoder`	`false`	Do not freeze the vision encoder
`train_expert_only`	`false`	Do not freeze the VLM, train all parameters

💡 Tip: Setting train_expert_only=true freezes the VLM and trains only the action expert and projections, allowing finetuning with reduced memory usage.

Relative Actions

By default, π₀ predicts absolute actions. You can enable relative actions so the model predicts offsets relative to the current robot state. This can improve training stability for certain setups.

To use relative actions, first recompute your dataset stats in relative space via the CLI:

bash

lerobot-edit-dataset \
    --repo_id your_dataset \
    --operation.type recompute_stats \
    --operation.relative_action true \
    --operation.chunk_size 50 \
    --operation.relative_exclude_joints "['gripper']" \
    --push_to_hub true

Or equivalently in Python:

python

from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.datasets.dataset_tools import recompute_stats

dataset = LeRobotDataset("your_dataset")
recompute_stats(dataset, relative_action=True, chunk_size=50, relative_exclude_joints=["gripper"])
dataset.push_to_hub()

The chunk_size should match your policy's chunk_size (default 50 for π₀). relative_exclude_joints lists joint names that should remain in absolute space (e.g. gripper commands). Use --push_to_hub true to upload the updated stats to the Hub.

Then train with relative actions enabled:

bash

lerobot-train \
    --dataset.repo_id=your_dataset \
    --policy.type=pi0 \
    --policy.use_relative_actions=true \
    --policy.relative_exclude_joints='["gripper"]' \
    ...

License

This model follows the Apache 2.0 License, consistent with the original OpenPI repository.