doc/source/ray-core/head-node-memory-management.rst
.. _head-node-memory-management:
When running Ray clusters for extended periods, the head node's memory usage can steadily increase over time, potentially leading to out-of-memory (OOM) errors that can make the entire cluster unusable. This guide explains the causes of head node memory growth and provides mitigation strategies.
.. contents::
:local:
observability-getting-started.RAY_DASHBOARD_MAX_EVENTS_TO_CACHE environment variable controls the cache size. For implementation details, see the event caching code <https://github.com/ray-project/ray/blob/814768317813afca2f0af740f58d024b059ae7d7/python/ray/dashboard/modules/event/event_head.py#L35>_.Avoid Scheduling on the Head Node
Running tasks or actors on the head node isn't recommended because it hosts critical system components. Preventing scheduling on the head node helps reduce contention and memory pressure.
See :ref:`vms-large-cluster-configure-head-node` for head-node best practices.
Disable the Dashboard
~~~~~~~~~~~~~~~~~~~~~
If you don't need the dashboard, disabling it removes event caching and related memory overhead. This reduces observability into the system so it's not recommended for production clusters.
**Python API:**
.. code-block:: python
import ray
ray.init(include_dashboard=False)
**CLI:**
.. code-block:: bash
ray start --head --include-dashboard=False
**Kubernetes:**
Set ``spec.headGroupSpec.rayStartParams.include-dashboard`` to ``"false"`` in your RayCluster configuration.
.. warning::
Disabling the dashboard prevents KubeRay's ``RayJob`` and ``RayService`` features from working properly.
Kubernetes Configuration
------------------------
Head Pod Memory Settings
~~~~~~~~~~~~~~~~~~~~~~~~
When deploying on Kubernetes, configure appropriate memory requests and limits for the head pod.
**Important:** Set memory and CPU resource requests equal to their limits. KubeRay uses the container's resource **limits** to configure Ray's logical resource capacities and ignores memory and CPU **requests**.
Example configuration:
.. code-block:: yaml
headGroupSpec:
template:
spec:
containers:
- name: ray-head
resources:
requests:
memory: "8Gi"
cpu: "4"
limits:
memory: "8Gi"
cpu: "4"
Recommended Head Node Specifications
For large clusters, a good starting specification for the head node is:
The actual requirements depend on your workload and cluster size.
Additionally, consider preventing Ray from scheduling tasks on the head node by setting num-cpus: "0" in rayStartParams.
.. note::
You can disable the dashboard, but doing so severely limits observability and isn't recommended for production. If you choose to disable it, see the Disable the Dashboard section in the preceding text.
If your head node experiences OOM issues:
ray memory. See :ref:debug-with-ray-memoryFor more information on OOM prevention, see :ref:ray-oom-prevention.