doc/source/cluster/kubernetes/user-guides/storage.md
(kuberay-storage)=
This document contains recommendations for setting up storage and handling application dependencies for your Ray deployment on Kubernetes.
When you set up Ray on Kubernetes, the KubeRay documentation provides an overview of how to configure the operator to execute and manage the Ray cluster lifecycle. However, as administrators you may still have questions with respect to actual user workflows. For example:
The answers to these questions vary between development and production. This table summarizes the recommended setup for each situation:
| Interactive Development | Production | |
|---|---|---|
| Cluster Configuration | KubeRay YAML | KubeRay YAML |
| Code | Run driver or Jupyter notebook on head node | Bake code into Docker image |
| Artifact Storage | Set up an EFS | |
| or | ||
| Cloud Storage (S3, GS) | Set up an EFS | |
| or | ||
| Cloud Storage (S3, GS) | ||
| Package Dependencies | Install onto NFS | |
| or | ||
| Use runtime environments | Bake into Docker image |
Table 1: Table comparing recommended setup for development and production.
To provide an interactive development environment for data scientists and ML practitioners, we recommend setting up the code, storage, and dependencies in a way that reduces context switches for developers and shortens iteration times.
.. image:: ../images/interactive-dev.png
:align: center
..
Find the source document here (https://whimsical.com/clusters-P5Y6R23riCuNb6xwXVXN72)
Use one of these two standard solutions for artifact and log storage during the development process, depending on your use case:
Ray's AI libraries such as Ray Data, Ray Train, and Ray Tune come with out-of-the-box capabilities to read and write from cloud storage and local or networked storage.
Run the main, or driver, script on the head node of the cluster. Ray Core and library programs often assume that the driver is on the head node and take advantage of the local storage. For example, Ray Tune generates log files on the head node by default.
A typical workflow can look like this:
For local dependencies, for example, if you’re working in a mono-repo, or external dependencies, like a pip package, use one of the following options:
The recommendations for production align with standard Kubernetes best practices. See the configuration in the following image:
.. image:: ../images/production.png
:align: center
..
Find the source document here (https://whimsical.com/clusters-P5Y6R23riCuNb6xwXVXN72)
The choice of storage system remains the same across development and production.
Bake your code, remote, and local dependencies into a published Docker image for all nodes in the cluster. This approach is the most common way to deploy applications onto Kubernetes. See Custom Docker Images.
Using cloud storage and the runtime env is a less preferred method as it may not be as reproducible as the container path, but it's still viable. In this case, use the runtime environment option to download zip files containing code and other private modules from cloud storage, in addition to specifying the pip packages needed to run your application.