Back to Lerobot

Train RL in Simulation

docs/source/hilserl_sim.mdx

0.5.14.1 KB
Original Source

Train RL in Simulation

This guide explains how to use the gym_hil simulation environments as an alternative to real robots when working with the LeRobot framework for Human-In-the-Loop (HIL) reinforcement learning.

gym_hil is a package that provides Gymnasium-compatible simulation environments specifically designed for Human-In-the-Loop reinforcement learning. These environments allow you to:

  • Train policies in simulation to test the RL stack before training on real robots

  • Collect demonstrations in sim using external devices like gamepads or keyboards

  • Perform human interventions during policy learning

Currently, the main environment is a Franka Panda robot simulation based on MuJoCo, with tasks like picking up a cube.

Installation

First, install the gym_hil package within the LeRobot environment:

bash
pip install -e ".[hilserl]"

What do I need?

  • A gamepad or keyboard to control the robot
  • A Nvidia GPU

Configuration

To use gym_hil with LeRobot, you need to create a configuration file. An example is provided here. Key configuration sections include:

Environment Type and Task

json
{
  "env": {
    "type": "gym_manipulator",
    "name": "gym_hil",
    "task": "PandaPickCubeGamepad-v0",
    "fps": 10
  },
  "device": "cuda"
}

Available tasks:

  • PandaPickCubeBase-v0: Basic environment
  • PandaPickCubeGamepad-v0: With gamepad control
  • PandaPickCubeKeyboard-v0: With keyboard control

Processor Configuration

json
{
  "env": {
    "processor": {
      "control_mode": "gamepad",
      "gripper": {
        "use_gripper": true,
        "gripper_penalty": -0.02
      },
      "reset": {
        "control_time_s": 15.0,
        "fixed_reset_joint_positions": [
          0.0, 0.195, 0.0, -2.43, 0.0, 2.62, 0.785
        ]
      },
      "inverse_kinematics": {
        "end_effector_step_sizes": {
          "x": 0.025,
          "y": 0.025,
          "z": 0.025
        }
      }
    }
  }
}

Important parameters:

  • gripper.gripper_penalty: Penalty for excessive gripper movement
  • gripper.use_gripper: Whether to enable gripper control
  • inverse_kinematics.end_effector_step_sizes: Size of the steps in the x,y,z axes of the end-effector
  • control_mode: Set to "gamepad" to use a gamepad controller

Running with HIL RL of LeRobot

Basic Usage

To run the environment, set mode to null:

bash
python -m lerobot.rl.gym_manipulator --config_path path/to/gym_hil_env.json

Recording a Dataset

To collect a dataset, set the mode to record whilst defining the repo_id and number of episodes to record:

json
{
  "env": {
    "type": "gym_manipulator",
    "name": "gym_hil",
    "task": "PandaPickCubeGamepad-v0"
  },
  "dataset": {
    "repo_id": "username/sim_dataset",
    "root": null,
    "task": "pick_cube",
    "num_episodes_to_record": 10,
    "replay_episode": null,
    "push_to_hub": true
  },
  "mode": "record"
}
bash
python -m lerobot.rl.gym_manipulator --config_path path/to/gym_hil_env.json

Training a Policy

To train a policy, checkout the configuration example available here and run the actor and learner servers:

bash
python -m lerobot.rl.actor --config_path path/to/train_gym_hil_env.json

In a different terminal, run the learner server:

bash
python -m lerobot.rl.learner --config_path path/to/train_gym_hil_env.json

The simulation environment provides a safe and repeatable way to develop and test your Human-In-the-Loop reinforcement learning components before deploying to real robots.

Congrats 🎉, you have finished this tutorial!

[!TIP] If you have any questions or need help, please reach out on Discord.

Paper citation:

@article{luo2024precise,
  title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
  author={Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey},
  journal={arXiv preprint arXiv:2410.21845},
  year={2024}
}