doc/source/ray-references/glossary.rst
.. _ray_glossary:
On this page you find a list of important terminology used throughout the Ray documentation, sorted alphabetically.
.. glossary::
Action space
Property of an RL environment. The shape(s) and datatype(s) that actions within
an RL environment are allowed to have.
Examples: An RL environment, in which an agent can move up, down, left,
or right might have an action space of ``Discrete(4)`` (integer values
of 0, 1, 2, or 3).
An RL environment, in which an agent can apply a torque between
-1.0 and 1.0 to a joint, the action space might be
``Box(-1.0, 1.0, (1,), float32)`` (single float values between -1.0 and 1.0).
Actor
A Ray actor is a remote instance of a class, which is
essentially a stateful service. :ref:`Learn more about Ray actors<actor-guide>`.
Actor task
An invocation of a Ray actor method. Sometimes we just call it a task.
Ray Agent
Daemon process running on each Ray node. It has several functionalities like
collecting metrics on the local node and installing runtime environments.
Agent
An acting entity inside an RL environment. One RL environment might contain
one (single-agent RL) or more (multi-agent RL) acting agents. Different agents
within the same environment might have different observation- and action-spaces,
different reward functions, and act at different time-steps.
Algorithm
A class that holds the who/when/where/how for training one or more RL agent(s).
The user interacts with an Algorithm instance directly to train their agents
(it is the top-most user facing API of RLlib).
Asynchronous execution
An execution model where a later task can begin executing in parallel,
without waiting for an earlier task to finish.
Ray tasks and actor tasks are all executed asynchronously.
Asynchronous sampling
Sampling is the process of rolling out (playing) episodes within an RL
environment and thereby collecting the training data (observations, actions
and rewards). In an asynchronous sampling setup, Ray actors run sampling in the
background and send collected samples back to a main driver script. The driver
checks for such “ready” data frequently and then triggers central model
learning updates. Hence, sampling and learning happen at the same time.
Note that because of this, the policy/ies used for creating the samples
(action computations) might be slightly behind the centrally learned policy
model(s), even in an on-policy Algorithm.
Autoscaler
A Ray component that scales up and down the Ray cluster by adding and removing
Ray nodes according to the resources requested by applications running on
the cluster.
Autoscaling
The process of scaling up and down the Ray cluster automatically.
Backend
A class containing the initialization and teardown logic for a specific deep
learning framework (e.g., Torch, TensorFlow), used to set up distributed
data-parallel training for :ref:`Ray Train’s built-in trainers<train-api>`.
Batch format
The way Ray Data represents batches of data. The batch format is independent
from how Ray Data stores the underlying blocks, so you can use any batch format
regardless of the internal block representation.
Set ``batch_format`` in methods like
:meth:`Dataset.iter_batches() <ray.data.Dataset.iter_batches>` and
:meth:`Dataset.map_batches() <ray.data.Dataset.map_batches>` to specify the
batch type.
.. doctest::
>>> import ray
>>> dataset = ray.data.range(15)
>>> next(iter(dataset.iter_batches(batch_format="numpy", batch_size=5)))
{'id': array([0, 1, 2, 3, 4])}
>>> next(iter(dataset.iter_batches(batch_format="pandas", batch_size=5)))
id
0 0
1 1
2 2
3 3
4 4
>>> next(iter(dataset.iter_batches(batch_format="pyarrow", batch_size=5)))
pyarrow.Table
id: int64
----
id: [[0],[1],...,[3],[4]]
To learn more about batch formats, read
:ref:`Configuring batch formats <configure_batch_format>`.
Batch size
A batch size in the context of model training is the number of data points used
to compute and apply one gradient update to the model weights.
Block
A processing unit of data. A :class:`~ray.data.Dataset` consists of a
collection of blocks.
Under the hood, Ray Data partitions rows into a set of distributed data blocks.
This allows it to perform operations in parallel.
Unlike a batch, which is a user-facing object, a block is an internal abstraction.
Placement Group Bundle
A collection of resources that must be reserved on a single Ray node.
:ref:`Learn more<ray-placement-group-doc-ref>`.
Checkpoint
A Ray Train Checkpoint is a common interface for accessing data and models across
different Ray components and libraries. A Checkpoint can have its data
represented as a directory on local (on-disk) storage, as a directory on an
external storage (e.g., cloud storage), and as an in-memory dictionary.
:class:`Learn more <ray.train.Checkpoint>`.
.. TODO: How does this relate to RLlib checkpoints etc.? Be clear here
Ray Client
The Ray Client is an API that connects a Python script to a remote Ray cluster.
Effectively, it allows you to leverage a remote Ray cluster just like you would
with Ray running on your local machine.
:ref:`Learn more<ray-client-ref>`.
Ray Cluster
A Ray cluster is a set of worker nodes connected to a common Ray head node.
Ray clusters can be fixed-size, or they can autoscale up and down according to
the resources requested by applications running on the cluster.
.. TODO: Add "Concurrency" here, or try to avoid this in docs.
Connector
A connector performs transformations on data that comes out of a dataset or an
RL environment and is about to be passed to a model. Connectors are flexible
components and can be swapped out such that models are easily reusable and do
not have to be retrained for different data transformations.
Tune Config
This is the set of hyperparameters corresponding to a Tune trial.
Sampling from a hyperparameter search space will produce a config.
.. TODO: DAG
Ray Dashboard
Ray’s built-in dashboard is a web interface that provides metrics, charts,
and other features that help Ray users to understand and debug Ray applications.
.. TODO: Data Shuffling
Dataset (object)
A class that produces a sequence of distributed data blocks.
:class:`~ray.data.Dataset` exposes methods to read, transform, and consume data at scale.
To learn more about Datasets and the operations they support, read the :ref:`Datasets API Reference <data-api>`.
Deployment
A deployment is a group of actors that can handle traffic in Ray Serve.
Deployments are defined as a single class with a number of options, including
the number of “replicas” of the deployment, each of which will map to a Ray
actor at runtime. Requests to a deployment are load balanced across its replicas.
Ingress Deployment
In Ray Serve, the “ingress” deployment is the one that receives and responds to
inbound user traffic. It handles HTTP parsing and response formatting. In the case
of model composition, it would also fan out requests to other deployments to do
things like preprocessing and a forward pass of an ML model.
Driver
"Driver" is the name of the process running the main script that starts all
other processes. For Python, this is usually the script you start with
``python ...``.
Tune Driver
The Tune driver is the main event loop that’s happening on the node that
launched the Tune experiment. This event loop schedules trials given the
cluster resources, executes training on remote Trainable actors, and processes
results and checkpoints from those actors.
Distributed Data-Parallel
A distributed data-parallel (DDP) training job scales machine learning training
to happen on multiple nodes, where each node processes one shard of the full
dataset. Every worker holds a copy of the model weights, and a common strategy
for updating weights is a “mirrored strategy”, where each worker will hold the
exact same weights at all times, and computed gradients are averaged then
applied across all workers.
With N worker nodes and a dataset of size D, each worker is responsible for
only ``D / N`` datapoints. If each worker node computes the gradient on a batch
of size ``B``, then the effective batch size of the DDP training is ``N * B``.
.. TODO: Entrypoint
Environment
The world or simulation, in which one or more reinforcement learning agents
have to learn to behave optimally with respect to a given reward function. An
environment consists of an observation space, a reward function, an action
space, a state transition function, and a distribution over initial states
(after a reset).
Episodes consisting of one or more time-steps are played through an
environment in order to generate and collect samples for learning.
These samples contain one 4-tuple of
``[observation, action, reward, next observation]`` per timestep.
Episode
A series of subsequent RL environment timesteps, each of which is a
4-tuple: ``[observation, action, reward, next observation]``.
Episodes can end with the terminated- or truncated-flags being True.
An episode generally spans multiple time-steps for one or more agents.
The Episode is an important concept in RL as "optimal agent behavior" is
defined as choosing actions that maximize the sum of individual rewards
over the course of an episode.
Trial Executor
An internal :ref:`Ray Tune component<raytrialexecutor-docstring>` that manages
the resource management and execution of each trial’s corresponding remote
Trainable actor. The trial executor’s responsibilities include launching
training, checkpointing, and restoring remote tasks.
Experiment
A Ray Tune or Ray Train experiment is a collection of one or more training jobs
that may correspond to different hyperparameter configurations. These
experiments are launched via the
:ref:`Tuner API<tune-run-ref>` and the :ref:`Trainer API<train-api>`.
.. TODO: Event
Fault tolerance
Fault tolerance in Ray Train and Tune consists of experiment-level and trial-level
restoration. Experiment-level restoration refers to resuming all trials,
in the event that an experiment is interrupted in the middle of training due
to a cluster-level failure. Trial-level restoration refers to resuming
individual trials, in the event that a trial encounters a runtime
error such as OOM.
.. TODO: more on fault tolerance in Core
Framework
The deep-learning framework used for the model(s), loss(es), and optimizer(s)
inside an RLlib Algorithm. RLlib currently supports PyTorch and TensorFlow.
GCS / Global Control Service
Centralized metadata server for a Ray cluster. It runs on the Ray head node
and has functions like managing node membership and actor directory.
It’s also known as the Global Control Store.
Head node
A node that runs extra cluster-level processes like GCS and API server in
addition to those processes running on a worker node. A Ray cluster only has
one head node.
HPO
Hyperparameter optimization (HPO) is the process of choosing a set of optimal
hyperparameters for a learning algorithm. A hyperparameter can be a parameter
whose value is used to control the learning process (e.g., learning rate),
define the model architecture (e.g, number of hidden layers), or influence data
pre-processing. In the case of Ray Train, hyperparameters can also include
compute processing scale-out parameters such as the number of distributed
training workers.
.. TODO: Inference
Job
A Ray job is a packaged Ray application that can be executed on a
(remote) Ray cluster. :ref:`Learn more<jobs-overview>`.
Lineage
For Ray objects, this is the set of tasks that was originally executed to
produce the object. If an object’s value is lost due to node failure,
Ray may attempt to recover the value by re-executing the object’s lineage.
.. TODO: Logs
.. TODO: Metrics
Model
A function approximator with trainable parameters (e.g. a neural network) that
can be trained by an algorithm on available data or collected data from an RL
environment. The parameters are usually initialized at random (unlearned state).
During the training process, checkpoints of the model can be created such that -
after the learning process is shut down or crashes - training can resume from
the latest weights rather than having to re-learn from scratch.
After the training process is completed, models can be deployed into production
for inference using Ray Serve.
Multi-agent
Denotes an RL environment setup, in which several (more than one) agents act
in the same environment and learn either the same or different optimal
behaviors. The relationship between the different agents in a multi-agent setup
might be adversarial (playing against each other), cooperative (trying to reach
a common goal) or neutral (the agents don’t really care about other agents’
actions). The NN model architectures that can be used for multi-agent training
range from "independent" (each agent trains its own separate model), over
"partially shared" (i.e. some agents might share their value function, because
they have a common goal), to "identical" (all agents train on the same model).
Namespace
A namespace is a logical grouping of jobs and named actors. When an actor is
named, its name must be unique within the namespace.
When a namespace is not specified, Ray will place your job in an anonymous
namespace.
Node
A Ray node is a physical or virtual machine that is part of a Ray cluster.
See also :term:`Head node`.
Object
An application value. These are values that are returned by a task or
created through ``ray.put``.
Object ownership
Ownership is the concept used to decide where metadata for a certain
``ObjectRef`` (and the task that creates the value) should be stored.
If a worker calls ``foo.remote()`` or ``ray.put()``, it owns the metadata for
the returned ``ObjectRef``, e.g., ref count and location information. If an
object’s owner dies and another worker tries to get the value,
it will receive an ``OwnerDiedError`` exception.
Object reference
A pointer to an application value, which can be stored anywhere in the cluster.
Can be created by calling ``foo.remote()`` or ``ray.put()``.
If using ``foo.remote()``, then the returned ``ObjectRef`` is also a future.
Object store
A distributed in-memory data store for storing Ray objects.
Object spilling
Objects in the object store are spilled to external storage once the capacity
of the object store is used up. This enables out-of-core data processing for
memory-intensive distributed applications. It comes with a performance penalty
since data needs to be written to disk.
.. TODO: Observability
Observation
The full or partial state of an RL environment, which an agent sees
(has access to) at each timestep. A fully observable environment produces
observations that contain all the information to sufficiently infer the current
underlying state of the environment. Such states are also called “Markovian”.
Examples for environments with Markovian observations are chess or 2D games,
in which the player can see with each frame the entirety of the game’s state.
A partially observable (or non-Markovian) environment produces observations
that do not contain sufficient information to infer the exact underlying state.
An example here would be a robot with a camera on its head facing forward.
The robot walks around in a maze, but from a single camera frame might not know
what’s currently behind it.
Offline data
Data collected in an RL environment up-front and stored in some data format
(e.g. JSON). Offline data can be used to train an RL agent. The data might have
been generated by a non-RL/ML system, such as a simple decision making script.
Also, when training from offline data, the RL algorithm will not be able to
explore new actions in new situations as all interactions with the environment
already happened in the past (were recorded prior to training).
Offline RL
A sub-field of reinforcement learning (RL), in which specialized offline
RL Algorithms learn how to compute optimal actions for an agent inside an
environment without the ability to interact live with that environment.
Instead, the data used for training has already been collected up-front
(maybe even by a non-RL/ML system). This is very similar to a supervised
learning setup. Examples for offline RL algorithms are MARWIL, CQL, and CRR.
Off-Policy
A type of RL Algorithm. In an off-policy algorithm, the policy used to compute
the actions inside an RL environment (to generate the training data) might be
different from the one that is being optimized. Examples for off-policy
Algorithms are DQN, SAC, and DDPG.
On-Policy
A type of RL Algorithm. In an on-policy algorithm, the policy used to compute
the actions inside an RL environment (to generate the training data) must be the
exact same (matching NN weights at all times) as the one that's being
optimized. Examples for on-policy Algorithms are PPO, APPO, and IMPALA.
OOM (Out of Memory)
Ray may run out of memory if the application is using too much memory on a
single node. In this case the :ref:`Ray OOM killer<oom-questions>` will kick
in and kill worker processes to free up memory.
Placement group
Placement groups allow users to atomically reserve groups of resources across
multiple nodes (i.e., gang scheduling). They can be then used to schedule Ray
tasks and actors packed as close as possible for locality (PACK), or spread
apart (SPREAD). Placement groups are generally used for gang-scheduling actors,
but also support tasks.
:ref:`Learn more<ray-placement-group-doc-ref>`.
Policy
A (neural network) model that maps an RL environment observation of some agent
to its next action inside an RL environment.
.. TODO: Policy evaluation
Predictor
:class:`An interface for performing inference<ray.train.predictor.Predictor>` (prediction)
on input data with a trained model.
Preprocessor
:ref:`An interface used to preprocess a Dataset<preprocessor-ref>` for
training and inference (prediction). Preprocessors
can be stateful, as they can be fitted on the training dataset before being
used to transform the training and evaluation datasets.
.. TODO: Process
Ray application
A collection of Ray tasks, actors, and objects that originate from the
same script.
.. TODO: Ray Timeline
Raylet
A system process that runs on each Ray node. It’s responsible for scheduling
and object management.
Replica
A replica is a single actor that handles requests to a given Serve deployment.
A deployment may consist of many replicas, either statically-configured via
``num_replicas`` or dynamically configured using auto-scaling.
Resource (logical and physical)
Ray resources are logical resources (e.g. CPU, GPU) used by tasks and actors.
It doesn't necessarily map 1-to-1 to physical resources of machines on which
Ray cluster runs. :ref:`Learn more<core-resources>`.
Reward
A single floating point value that each agent within an RL environment receives
after each action taken. An agent is defined to be acting optimally inside the
RL environment when the sum over all received rewards within an episode is
maximized.
Note that rewards might be delayed (not immediately telling the agent, whether
an action was good or bad) or sparse (often have a value of zero) making it
harder for the agent to learn.
Rollout
The process of advancing through an episode in an RL environment (with one or
more RL agents) by taking sequential actions. During rollouts, the algorithm
should collect the environment produced 4-tuples [observations, actions,
rewards, next observations] in order to (later or simultaneously) learn how to
behave more optimally from this data.
Rollout Worker
Component within a RLlib Algorithm responsible for advancing and collecting
observations and rewards in an RL environment. Actions for the different
agent(s) within the environment are computed by the Algorithms’ policy models.
A distributed algorithm might have several replicas of Rollout Workers running
as Ray actors in order to scale the data collection process for faster RL
training.
.. START ROLLOUT WORKER
RolloutWorkers are used as ``@ray.remote`` actors to collect and return samples
from environments or offline files in parallel. An RLlib
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` usually has
``num_workers`` :py:class:`~ray.rllib.env.env_runner.EnvRunner` instances plus a
single "local" :py:class:`~ray.rllib.env.env_runner.EnvRunner` (not ``@ray.remote``) in
its :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup` under ``self.workers``.
Depending on its evaluation config settings, an additional
:py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup` with
:py:class:`~ray.rllib.env.env_runner.EnvRunner` instances for evaluation may be present in the
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
under ``self.evaluation_workers``.
.. END ROLLOUT WORKER
.. TODO: Runtime
Runtime environment
A runtime environment defines dependencies such as files, packages, environment
variables needed for a Python script to run. It is installed dynamically on the
cluster at runtime, and can be specified for a Ray job, or for specific actors
and tasks. :ref:`Learn more<handling_dependencies>`.
Remote Function
See :term:`Task`.
Remote Class
See :term:`Actor`.
(Ray) Scheduler
A Ray component that assigns execution units (Task/Actor) to Ray nodes.
Search Space
The definition of the possible values for hyperparameters. Can be composed out
of constants, discrete values, distributions of functions. This is also
referred to as the “parameter space” (``param_space`` in the ``Tuner``).
Search algorithm
Search algorithms suggest new hyperparameter configurations to be evaluated
by Tune. The default search algorithm is random search, where each new
configuration is independent from the previous one. More sophisticated search
algorithms such as ones using Bayesian optimization will fit a model to predict
the hyperparameter configuration that will produce the best model, while also
exploring the space of possible hyperparameters. Many popular search algorithms
are built into Tune, most of which are integrations with other libraries.
Serve application
An application is the unit of upgrade in a Serve cluster.
An application consists of one or more deployments. One of these deployments
is considered the “ingress” deployment, which is where all inbound
traffic is handled.
Applications can be called via HTTP at their configured ``route_prefix``.
DeploymentHandle
DeploymentHandle is the Python API for making requests to Serve deployments. A
handle is defined by passing one bound Serve deployment to the constructor of
another. Then at runtime that reference can be used to make requests. This is
used to combine multiple deployments for model composition.
Session
- A Ray Train/Tune session: Tune session at the experiment execution layer
and Train session at the Data Parallel training layer
if running data-parallel distributed training with Ray Train.
The session allows access to metadata, such as which trial is being run,
information about the total number of workers, as well as the rank of the
current worker. The session is also the interface through which an individual
Trainable can interact with the Tune experiment as a whole. This includes uses
such as reporting an individual trial’s metrics, saving/loading checkpoints,
and retrieving the corresponding dataset shards for each Train worker.
- A Ray cluster: in some cases the session also means a :term:`Ray Cluster`.
For example, logs of a Ray cluster are stored under ``session_xxx/logs/``.
Spillback
A task caller schedules a task by first sending a resource request to the
preferred raylet for that request. If the preferred raylet chooses not to grant
the resources locally, it may also “Spillback” and respond to the caller with
the address of a remote raylet at which the caller should retry the resource
request.
State
State of the environment an RL agent interacts with.
Synchronous execution
Two tasks A and B are executed synchronously if A must finish before B can
start. For example, if you call ``ray.get`` immediately after launching a remote
task with ``task.remote()``, you’ll be running with synchronous execution,
since this will wait for the task to finish before the program continues.
Synchronous sampling
Sampling workers work in synchronous steps. All of them must finish collecting
a new batch of samples before training can proceed to the next iteration.
Task
A remote function invocation. This is a single function invocation that
executes on a process different from the caller, and potentially on a different
machine. A task can be stateless (a ``@ray.remote`` function) or stateful (a
method of a ``@ray.remote`` class - see Actor below). A task is executed
asynchronously with the caller: the ``.remote()`` call immediately returns
one or more ``ObjectRefs`` (futures) that can be used to retrieve the
return value(s). See :term:`Actor task`.
Trainable
A :ref:`Trainable<trainable-docs>` is the interface that Ray Tune uses to
perform custom training
logic. User-defined Trainables take in a configuration as an input and can
run user-defined training code as well as custom metric reporting and
checkpointing.
There are many types of trainables. Most commonly used is the function
trainable API, which is simply a Python function that contains model training
logic and metric reporting. Tune also exposes a class trainable API, which
allows you to implement training, checkpointing, and restoring as different
methods.
Ray Tune associates each trial with its own Trainable – the Trainable is the
one actually doing training. The Trainable is a remote actor that can be placed
on any node in a Ray cluster.
Trainer
A Trainer is the top-level API to configure a single distributed training job.
:ref:`There are built-in Trainers for different frameworks<train-api>`,
like PyTorch, Tensorflow, and XGBoost. Each trainer shares a common interface
and otherwise defines framework-specific configurations and entrypoints. The
main job of a trainer is to coordinate N distributed training workers and set
up the communication backends necessary for these workers to communicate
(e.g., for sharing computed gradients).
Trainer configuration
A Trainer can be configured in various ways. Some
configurations are shared across all trainers, like the RunConfig, which
configures things like the experiment storage, and ScalingConfig, which
configures the number of training workers as well as resources needed per
worker. Other configurations are specific to the trainer framework.
Training iteration
A partial training pass of input data up to pre-defined yield point
(e.g., time or data consumed) for checkpointing of long running training jobs.
A full training epoch can consist of multiple training iterations.
.. TODO: RLlib
Training epoch
A full training pass of the input dataset. Typically, model training iterates
through the full dataset in batches of size B, where gradients are calculated
on each batch and then applied as an update to the model weights. Training
jobs can consist of multiple epochs by training through the same dataset
multiple times.
Training step
An RLlib-specific method of the Algorithm class which includes the core logic
of an RL algorithm. Commonly includes gathering of experiences (either through
sampling or from offline data), optimization steps, redistribution of learnt
model weights. The particularities of this method are specific to algorithms
and configurations.
Transition
A tuple of (observation, action, reward, next observation). A transition
represents one step of an agent in an environment.
Trial
One training run within a Ray Tune experiment. If you run multiple trials,
each trial usually corresponds to a different config (a set of hyperparameters).
Trial scheduler
When running a Ray Tune job, the scheduler will decide how to allocate
resources to trials. In the most common case, this resource is time - the trial
scheduler decides which trials to run at what time. Certain built-in schedulers
like Asynchronous Hyperband (ASHA) perform early stopping of under-performing
trials, while others like Population Based Training (PBT) will make
under-performing trials copy the hyperparameter config and model weights of
top performing trials and continue training.
Tuner
The Tuner is the top level Ray Tune API used to configure and run an
experiment with many trials.
.. TODO: Tunable
.. TODO: (Ray) Workflow
.. TODO: WorkerGroup
.. TODO: Worker heap
.. TODO: Worker node / worker node pod
Worker process / worker
The process that runs user defined tasks and actors.