doc/source/cluster/vms/user-guides/large-cluster-best-practices.rst
.. _vms-large-cluster:
This section aims to document best practices for deploying Ray clusters at large scale.
Networking configuration ^^^^^^^^^^^^^^^^^^^^^^^^
End users should only need to directly interact with the head node of the cluster. In particular, there are 2 services which should be exposed to users:
.. note::
While users only need 2 ports to connect to a cluster, the nodes within a cluster require a much wider range of ports to communicate.
See :ref:Ray port configuration <Ray-ports> for a comprehensive list.
Applications (such as :ref:Ray Serve <Rayserve>) may also require
additional ports to work properly.
System configuration ^^^^^^^^^^^^^^^^^^^^
There are a few system level configurations that should be set when using Ray at a large scale.
ulimit -n is set to at least 65535. Ray opens many direct
connections between worker processes to avoid bottlenecks, so it can quickly
use a large number of file descriptors./dev/shm is sufficiently large. Most ML/RL applications rely
heavily on the plasma store. By default, Ray will try to use /dev/shm for
the object store, but if it is not large enough (i.e. --object-store-memory
size of
/dev/shm), Ray will write the plasma store to disk instead, which may cause significant performance problems.
object spilling <object-spilling> is enabled Ray will spill objects to
disk if necessary. This is most commonly needed for data processing
workloads... _vms-large-cluster-configure-head-node:
Configuring the head node ^^^^^^^^^^^^^^^^^^^^^^^^^
In addition to the above changes, when deploying a large cluster, Ray's architecture means that the head node has extra stress due to additional system processes running on it like GCS.
resources: {"CPU": 0} on the head node.
(For Ray clusters deployed using KubeRay,
set rayStartParams: {"num-cpus": "0"}.
See the :ref:configuration guide for KubeRay clusters <kuberay-num-cpus>.)
Due to the heavy networking load (and the GCS and dashboard processes), we
recommend setting the quantity of logical CPU resources to 0 on the head node
to avoid scheduling additional tasks on it.For long-running clusters, head node memory usage can steadily increase over time.
See :ref:head-node-memory-management for detailed information on causes and mitigation strategies.
Configuring the autoscaler ^^^^^^^^^^^^^^^^^^^^^^^^^^
For large, long running clusters, there are a few parameters that can be tuned.
AUTOSCALER_MAX_NUM_FAILURES environment
variable to a large number (or inf) to avoid unexpected autoscaler
crashes. The variable can be set by prepending \ export AUTOSCALER_MAX_NUM_FAILURES=inf;
to the head node's Ray start command.
(Note: you may want a separate mechanism to detect if the autoscaler
errors too often).upscaling_speed for faster
autoscaling.Picking nodes ^^^^^^^^^^^^^
Here are some tips for how to set your available_node_types for a cluster,
using AWS instance types as a concrete example.
General recommendations with AWS instance types:
When to use GPUs
What type of GPU?
g3), for most well designed applications the performance outweighs the price. (The instance price may be higher, but you use the instance for less time.)
What type of CPU?
How many CPUs/GPUs?
.. note::
If you're using RLlib, check out :ref:the RLlib scaling guide <rllib-scaling-guide> for RLlib specific recommendations.