doc/source/rllib/index.rst
.. include:: /_includes/rllib/we_are_hiring.rst
.. sphinx_rllib_readme_begin
.. _rllib-index:
.. include:: /_includes/rllib/new_api_stack.rst
.. image:: images/rllib-logo.png :align: center
.. sphinx_rllib_readme_end
.. todo (sven): redo toctree: suggestion: getting-started key-concepts rllib-env (single-agent) ... <- multi-agent ... <- external ... <- hierarchical algorithm-configs rllib-algorithms (overview of all available algos) dev-guide (replaces user-guides) debugging scaling-guide fault-tolerance checkpoints callbacks metrics-logger rllib-advanced-api algorithm (general description of how algos work) rl-modules rllib-offline single-agent-episode multi-agent-episode connector-v2 rllib-learner env-runners rllib-examples new-api-stack-migration-guide package_ref/index
.. toctree:: :hidden:
getting-started
key-concepts
rllib-env
algorithm-config
rllib-algorithms
user-guides
rllib-examples
new-api-stack-migration-guide
package_ref/index
.. sphinx_rllib_readme_2_begin
RLlib is an open source library for reinforcement learning (RL), offering support for production-level, highly scalable, and fault-tolerant RL workloads, while maintaining simple and unified APIs for a large variety of industry applications.
Whether training policies in a multi-agent setup, from historic offline data, or using externally connected simulators, RLlib offers simple solutions for each of these autonomous decision making needs and enables you to start running your experiments within hours.
Industry leaders use RLlib in production in many different verticals, such as
gaming <https://www.anyscale.com/events/2021/06/22/using-reinforcement-learning-to-optimize-iap-offer-recommendations-in-mobile-games>,
robotics <https://www.anyscale.com/events/2021/06/23/introducing-amazon-sagemaker-kubeflow-reinforcement-learning-pipelines-for>,
finance <https://www.anyscale.com/events/2021/06/22/a-24x-speedup-for-reinforcement-learning-with-rllib-+-ray>,
climate- and industrial control <https://www.anyscale.com/events/2021/06/23/applying-ray-and-rllib-to-real-life-industrial-use-cases>,
manufacturing and logistics <https://www.anyscale.com/events/2022/03/29/alphadow-leveraging-rays-ecosystem-to-train-and-deploy-an-rl-industrial>,
automobile <https://www.anyscale.com/events/2021/06/23/using-rllib-in-an-enterprise-scale-reinforcement-learning-solution>,
and
boat design <https://www.youtube.com/watch?v=cLCK13ryTpw>_.
.. figure:: images/rllib-index-header.svg
It only takes a few steps to get your first RLlib workload up and running on your laptop.
Install RLlib and PyTorch <https://pytorch.org>__, as shown below:
.. code-block:: bash
pip install "ray[rllib]" torch
.. note::
For installation on computers running Apple Silicon, such as M1,
`follow instructions here. <https://docs.ray.io/en/latest/ray-overview/installation.html#m1-mac-apple-silicon-support>`_
.. note::
To be able to run the Atari or MuJoCo examples, you also need to do:
.. code-block:: bash
pip install "gymnasium[atari,accept-rom-license,mujoco]"
This is all, you can now start coding against RLlib. Here is an example for running the :ref:PPO Algorithm <ppo> on the
Taxi domain <https://gymnasium.farama.org/environments/toy_text/taxi/>__.
You first create a config for the algorithm, which defines the :ref:RL environment <rllib-key-concepts-environments> and any other needed settings and parameters.
.. testcode::
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.connectors.env_to_module import FlattenObservations
# Configure the algorithm.
config = (
PPOConfig()
.environment("Taxi-v3")
.env_runners(
num_env_runners=2,
# Observations are discrete (ints) -> We need to flatten (one-hot) them.
env_to_module_connector=lambda env: FlattenObservations(),
)
.evaluation(evaluation_num_env_runners=1)
)
Next, build the algorithm and train it for a total of five iterations.
One training iteration includes parallel, distributed sample collection by the
:py:class:~ray.rllib.env.env_runner.EnvRunner actors, followed by loss calculation
on the collected data, and a model update step.
.. testcode::
from pprint import pprint
# Build the algorithm.
algo = config.build_algo()
# Train it for 5 iterations ...
for _ in range(5):
pprint(algo.train())
At the end of your script, you evaluate the trained Algorithm and release all its resources:
.. testcode::
# ... and evaluate it.
pprint(algo.evaluate())
# Release the algo's resources (remote actors, like EnvRunners and Learners).
algo.stop()
You can use any Farama-Foundation Gymnasium <https://github.com/Farama-Foundation/Gymnasium>__ registered environment
with the env argument.
In config.env_runners() you can specify - amongst many other things - the number of parallel
:py:class:~ray.rllib.env.env_runner.EnvRunner actors to collect samples from the environment.
You can also tweak the NN architecture used by tweaking RLlib's :py:class:~ray.rllib.core.rl_module.default_model_cnofig.DefaultModelConfig,
as well as, set up a separate config for the evaluation
:py:class:~ray.rllib.env.env_runner.EnvRunner actors through the config.evaluation() method.
:ref:See here <rllib-python-api>, if you want to learn more about the RLlib training APIs.
Also, see here <https://github.com/ray-project/ray/blob/master/rllib/examples/inference/policy_inference_after_training.py>__
for a simple example on how to write an action inference loop after training.
If you want to get a quick preview of which algorithms and environments RLlib supports, click the dropdowns below:
.. dropdown:: RLlib Algorithms :animate: fade-in-slide-down
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| **On-Policy** |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| :ref:`PPO (Proximal Policy Optimization) <ppo>` | |single_agent| | |multi_agent| | |discr_act| | |cont_act| | |multi_gpu| | |multi_node_multi_gpu| |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| **Off-Policy** |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| :ref:`SAC (Soft Actor Critic) <sac>` | |single_agent| | |multi_agent| | | |cont_act| | |multi_gpu| | |multi_node_multi_gpu| |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| :ref:`DQN/Rainbow (Deep Q Networks) <dqn>` | |single_agent| | |multi_agent| | |discr_act| | | |multi_gpu| | |multi_node_multi_gpu| |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| **High-throughput Architectures** |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| :ref:`APPO (Asynchronous Proximal Policy Optimization) <appo>` | |single_agent| | |multi_agent| | |discr_act| | |cont_act| | |multi_gpu| | |multi_node_multi_gpu| |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| :ref:`IMPALA (Importance Weighted Actor-Learner Architecture) <impala>` | |single_agent| | |multi_agent| | |discr_act| | | |multi_gpu| | |multi_node_multi_gpu| |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| **Model-based RL** |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| :ref:`DreamerV3 <dreamerv3>` | |single_agent| | | |discr_act| | |cont_act| | |multi_gpu| | |multi_node_multi_gpu| |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| **Offline RL and Imitation Learning** |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| :ref:`BC (Behavior Cloning) <bc>` | |single_agent| | | |discr_act| | |cont_act| | | |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| :ref:`CQL (Conservative Q-Learning) <cql>` | |single_agent| | | | |cont_act| | | |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
| :ref:`MARWIL (Advantage Re-Weighted Imitation Learning) <marwil>` | |single_agent| | | |discr_act| | |cont_act| | | |
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+
.. dropdown:: RLlib Environments :animate: fade-in-slide-down
+-------------------------------------------------------------------------------------------+
| **Farama-Foundation Environments** |
+-------------------------------------------------------------------------------------------+
| `gymnasium <https://gymnasium.farama.org/index.html>`__ |single_agent| |
| |
| .. code-block:: bash |
| |
| pip install "gymnasium[atari,accept-rom-license,mujoco]"`` |
| |
| .. code-block:: python |
| |
| config.environment("CartPole-v1") # Classic Control |
| config.environment("ale_py:ALE/Pong-v5") # Atari |
| config.environment("Hopper-v5") # MuJoCo |
+-------------------------------------------------------------------------------------------+
| `PettingZoo <https://pettingzoo.farama.org/index.html>`__ |multi_agent| |
| |
| .. code-block:: bash |
| |
| pip install "pettingzoo[all]" |
| |
| .. code-block:: python |
| |
| from ray.tune.registry import register_env |
| from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv |
| from pettingzoo.sisl import waterworld_v4 |
| register_env("env", lambda _: PettingZooEnv(waterworld_v4.env())) |
| config.environment("env") |
+-------------------------------------------------------------------------------------------+
| **RLlib Multi-Agent** |
+-------------------------------------------------------------------------------------------+
| `RLlib's MultiAgentEnv API <rllib-env.html#multi-agent-and-hierarchical>`__ |multi_agent| |
| |
| .. code-block:: python |
| |
| from ray.rllib.examples.envs.classes.multi_agent import MultiAgentCartPole |
| from ray import tune |
| tune.register_env("env", lambda cfg: MultiAgentCartPole(cfg)) |
| config.environment("env", env_config={"num_agents": 2}) |
| config.multi_agent( |
| policies={"p0", "p1"}, |
| policy_mapping_fn=lambda aid, *a, **kw: f"p{aid}", |
| ) |
+-------------------------------------------------------------------------------------------+
.. dropdown:: Scalable and Fault-Tolerant :animate: fade-in-slide-down
RLlib workloads scale along various axes:
- The number of :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors to use.
This is configurable through ``config.env_runners(num_env_runners=...)`` and
allows you to scale the speed of your (simulator) data collection step.
This `EnvRunner` axis is fully **fault tolerant**, meaning you can train against
custom environments that are unstable or frequently stall execution and even place all
your `EnvRunner` actors on spot machines.
- The number of :py:class:`~ray.rllib.core.learner.Learner` actors to use for **multi-GPU training**.
This is configurable through ``config.learners(num_learners=...)`` and you normally
set this to the number of GPUs available (make sure you then also set
``config.learners(num_gpus_per_learner=1)``) or - if you do not have GPUs - you can
use this setting for **DDP-style learning on CPUs** instead.
.. dropdown:: Multi-Agent Reinforcement Learning (MARL) :animate: fade-in-slide-down
RLlib natively supports multi-agent reinforcement learning (MARL), thereby allowing you to run
in any complex configuration.
- **Independent** multi-agent learning (the default): Every agent collects data for updating its own
policy network, interpreting other agents as part of the environment.
- **Collaborative** training: Train a team of agents that either all share the same policy (shared parameters)
or in which some agents have their own policy network(s). You can also share value functions between all
members of the team or some of them, as you see fit, thus allowing for global vs local objectives to be
optimized.
- **Adversarial** training: Have agents play against other agents in competitive environments. Use self-play,
or league based self-play to train your agents to learn how to play throughout various stages of
ever increasing difficulty.
- **Any combination of the above!** Yes, you can train teams of arbitrary sizes of agents playing against
other teams where the agents in each team might have individual sub-objectives and there are groups
of neutral agents not participating in any competition.
.. dropdown:: Offline RL and Behavior Cloning :animate: fade-in-slide-down
**Ray.Data** has been integrated into RLlib, enabling **large-scale data ingestion** for offline RL and behavior
cloning (BC) workloads.
See here for a basic `tuned example for the behavior cloning algo <https://github.com/ray-project/ray/blob/master/rllib/examples/algorithms/bc/cartpole_bc.py>`__
and here for how to `pre-train a policy with BC, then finetuning it with online PPO <https://github.com/ray-project/ray/blob/master/rllib/examples/offline_rl/train_w_bc_finetune_w_ppo.py>`__.
.. dropdown:: Support for External Env Clients :animate: fade-in-slide-down
**Support for externally connecting RL environments** is achieved through customizing the :py:class:`~ray.rllib.env.env_runner.EnvRunner` logic
from RLlib-owned, internal gymnasium envs to external, TCP-connected Envs that act independently and may even perform their own
action inference, e.g. through ONNX.
See here for an example of `RLlib acting as a server with connecting external env TCP-clients <https://github.com/ray-project/ray/blob/master/rllib/examples/envs/env_connecting_to_rllib_w_tcp_client.py>`__.
.. grid:: 1 2 3 3 :gutter: 1 :class-container: container pb-4
.. grid-item-card::
**RLlib Key Concepts**
^^^
Learn more about the core concepts of RLlib, such as Algorithms, environments,
models, and learners.
+++
.. button-ref:: rllib-key-concepts
:color: primary
:outline:
:expand:
Key Concepts
.. grid-item-card::
**RL Environments**
^^^
Get started with environments supported by RLlib, such as Farama foundation's Gymnasium, Petting Zoo,
and many custom formats for vectorized and multi-agent environments.
+++
.. button-ref:: rllib-environments-doc
:color: primary
:outline:
:expand:
Environments
.. grid-item-card::
**Models (RLModule)**
^^^
Learn how to configure RLlib's default models and implement your own
custom models through the RLModule APIs, which support arbitrary architectures
with PyTorch, complex multi-model setups, and multi-agent models with components
shared between agents.
+++
.. button-ref:: rlmodule-guide
:color: primary
:outline:
:expand:
Models (RLModule)
.. grid-item-card::
**Algorithms**
^^^
See the many available RL algorithms of RLlib for on-policy and off-policy training,
offline- and model-based RL, multi-agent RL, and more.
+++
.. button-ref:: rllib-algorithms-doc
:color: primary
:outline:
:expand:
Algorithms
RLlib provides powerful, yet easy to use APIs for customizing all aspects of your experimental- and
production training-workflows.
For example, you may code your own environments <rllib-env.html#configuring-environments>__
in python using the Farama Foundation's gymnasium <https://farama.org>__ or DeepMind's OpenSpiel,
provide custom PyTorch models <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/custom_cnn_rl_module.py>,
write your own optimizer setups and loss definitions <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/ppo_with_custom_loss_fn.py>__,
or define custom exploratory behavior <https://github.com/ray-project/ray/blob/master/rllib/examples/curiosity/count_based_curiosity.py>.
.. figure:: images/rllib-new-api-stack-simple.svg :align: left :width: 850
**RLlib's API stack:** Built on top of Ray, RLlib offers off-the-shelf, distributed and fault-tolerant
algorithms and loss functions, PyTorch default models, multi-GPU training, and multi-agent support.
Users customize their experiments by subclassing the existing abstractions.
.. sphinx_rllib_readme_2_end
.. sphinx_rllib_readme_3_begin
If RLlib helps with your academic research, the Ray RLlib team encourages you to cite these papers:
.. code-block::
@inproceedings{liang2021rllib,
title={{RLlib} Flow: Distributed Reinforcement Learning is a Dataflow Problem},
author={
Wu, Zhanghao and
Liang, Eric and
Luo, Michael and
Mika, Sven and
Gonzalez, Joseph E. and
Stoica, Ion
},
booktitle={Conference on Neural Information Processing Systems ({NeurIPS})},
year={2021},
url={https://proceedings.neurips.cc/paper/2021/file/2bce32ed409f5ebcee2a7b417ad9beed-Paper.pdf}
}
@inproceedings{liang2018rllib,
title={{RLlib}: Abstractions for Distributed Reinforcement Learning},
author={
Eric Liang and
Richard Liaw and
Robert Nishihara and
Philipp Moritz and
Roy Fox and
Ken Goldberg and
Joseph E. Gonzalez and
Michael I. Jordan and
Ion Stoica,
},
booktitle = {International Conference on Machine Learning ({ICML})},
year={2018},
url={https://arxiv.org/pdf/1712.09381}
}
.. sphinx_rllib_readme_3_end
.. sigils used on this page
.. |single_agent| image:: /rllib/images/sigils/single-agent.svg :class: inline-figure :width: 72
.. |multi_agent| image:: /rllib/images/sigils/multi-agent.svg :class: inline-figure :width: 72
.. |discr_act| image:: /rllib/images/sigils/discr-actions.svg :class: inline-figure :width: 72
.. |cont_act| image:: /rllib/images/sigils/cont-actions.svg :class: inline-figure :width: 72
.. |multi_gpu| image:: /rllib/images/sigils/multi-gpu.svg :class: inline-figure :width: 72
.. |multi_node_multi_gpu| image:: /rllib/images/sigils/multi-node-multi-gpu.svg :class: inline-figure :alt: Only on the Anyscale Platform! :width: 72