Back to Ray

Profiling with py-spy

doc/source/cluster/kubernetes/k8s-ecosystem/pyspy.md

1.13.13.1 KB
Original Source

(kuberay-pyspy-integration)=

Profiling with py-spy

Stack trace and CPU profiling

py-spy is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way. This section describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph on Ray Dashboard.

Prerequisite

py-spy requires the SYS_PTRACE capability to read process memory. However, Kubernetes omits this capability by default. To enable profiling, add the following to the template.spec.containers for both the head and worker Pods.

bash
securityContext:
  capabilities:
    add:
    - SYS_PTRACE

Notes:

  • The baseline and restricted Pod Security Standards forbid adding SYS_PTRACE. See Pod Security Standards for more details.

Check CPU flame graph and stack trace on Ray Dashboard

Step 1: Create a Kind cluster

bash
kind create cluster

Step 2: Install the KubeRay operator

Follow this document to install the latest stable KubeRay operator using Helm repository.

Step 3: Create a RayCluster with SYS_PTRACE capability

bash
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-cluster.py-spy.yaml

Step 4: Forward the dashboard port

bash
kubectl port-forward svc/raycluster-py-spy-head-svc 8265:8265

Step 5: Run a sample job within the head Pod

bash
# Log in to the head Pod
kubectl exec -it ${YOUR_HEAD_POD} -- bash

# (Head Pod) Run a sample job in the Pod
# `long_running_task` includes a `while True` loop to ensure the task remains actively running indefinitely.
# This allows you ample time to view the Stack Trace and CPU Flame Graph via Ray Dashboard.
python3 samples/long_running_task.py

Notes:

  • If you're running your own examples and encounter the error Failed to write flamegraph: I/O error: No stack counts found when viewing CPU Flame Graph, it might be due to the process being idle. Notably, using the sleep function can lead to this state. In such situations, py-spy filters out the idle stack traces. Refer to this issue for more information.

Step 6: Profile using Ray Dashboard

Step 7: Clean up

bash
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-cluster.py-spy.yaml
helm uninstall kuberay-operator