Back to Ray

Start an Aliyun ACK cluster with GPUs for KubeRay

doc/source/cluster/kubernetes/user-guides/ack-gpu-cluster.md

1.13.12.0 KB
Original Source

(kuberay-ack-gpu-cluster-setup)=

Start an Aliyun ACK cluster with GPUs for KubeRay

This guide provides step-by-step instructions for creating an ACK cluster with GPU nodes specifically configured for KubeRay. The configuration outlined here can be applied to most KubeRay examples found in the documentation.

Step 1: Create a Kubernetes cluster on Aliyun ACK

See Create a cluster to create a Aliyun ACK cluster and see Connect to clusters to configure your computer to communicate with the cluster.

Step 2: Create node pools for the Aliyun ACK cluster

See Create a node pool to create node pools.

Manage node labels and taints

If you need to set taints for nodes, see Create and manage node labels and Create and manage node taints. For example, you can add a taint to GPU node pools so that Ray won't schedule head pods on these nodes.

Upgrade drivers on the nodes

If you need to upgrade the drivers on the nodes, see Step 2: Create a node pool and specify an NVIDIA driver version to upgrade drivers.

Step 3: Install KubeRay addon in the cluster

See Step 2: Install KubeRay-Operator to deploy KubeRay in ACK.