doc/source/ray-contribute/profiling.rst
.. _ray-core-internal-profiling:
This guide helps contributors to the Ray project analyze Ray performance.
You can use the following GDB command to view the current stack trace of any running Ray process (e.g., raylet). This can be useful for debugging 100% CPU utilization or infinite loops (simply run the command a few times to see what the process is stuck on).
.. code-block:: shell
sudo gdb -batch -ex "thread apply all bt" -p <pid>
Note that you can find the pid of the raylet with pgrep raylet.
These instructions are for Ubuntu only. Attempts to get pprof to correctly
symbolize on Mac OS have failed.
.. code-block:: bash
sudo apt-get install google-perftools libgoogle-perftools-dev
You may need to install graphviz for pprof to generate flame graphs.
.. code-block:: bash
sudo apt-get install graphviz
To launch Ray in profiling mode and profile Raylet, define the following variables:
.. code-block:: bash
export PERFTOOLS_PATH=/usr/lib/x86_64-linux-gnu/libprofiler.so export PERFTOOLS_LOGFILE=/tmp/pprof.out export RAY_RAYLET_PERFTOOLS_PROFILER=1
The file /tmp/pprof.out is empty until you let the binary run the
target workload for a while and then kill it via ray stop or by
letting the driver exit.
Note: Enabling RAY_RAYLET_PERFTOOLS_PROFILER allows profiling of the Raylet component.
To profile other modules, use RAY_{MODULE}_PERFTOOLS_PROFILER,
where MODULE represents the uppercase form of the process type, such as GCS_SERVER.
Visualizing the CPU profile
You can visualize the output of ``pprof`` in different ways. Below, the output is a
zoomable ``.svg`` image displaying the call graph annotated with hot paths.
.. code-block:: bash
# Use the appropriate path.
RAYLET=ray/python/ray/core/src/ray/raylet/raylet
google-pprof -svg $RAYLET /tmp/pprof.out > /tmp/pprof.svg
# Then open the .svg file with Chrome.
# If you realize the call graph is too large, use -focus=<some function> to zoom
# into subtrees.
google-pprof -focus=epoll_wait -svg $RAYLET /tmp/pprof.out > /tmp/pprof.svg
Below is a snapshot of an example SVG output, from the official
documentation:
.. image:: http://goog-perftools.sourceforge.net/doc/pprof-test-big.gif
Memory profiling
----------------
To run memory profiling on Ray core components, use `jemalloc <https://github.com/jemalloc/jemalloc>`_.
Ray supports environment variables that override `LD_PRELOAD` on core components.
You can find the component name from `ray_constants.py`. For example, if you'd like to profile gcs_server,
search `PROCESS_TYPE_GCS_SERVER` in `ray_constants.py`. You can see the value is `gcs_server`.
Users are supposed to provide 4 env vars for memory profiling.
* `RAY_JEMALLOC_LIB_PATH`: The path to the jemalloc shared library `libjemalloc.so`
* `RAY_JEMALLOC_CONF`: The MALLOC_CONF configuration for jemalloc, using comma-separated values. Read `jemalloc docs <http://jemalloc.net/jemalloc.3.html>`_ for more details.
* `RAY_JEMALLOC_PROFILE`: Comma separated Ray components to run Jemalloc `.so`. e.g., ("raylet,gcs_server"). Note that the components should match the process type in `ray_constants.py`. (It means "RAYLET,GCS_SERVER" won't work).
* `RAY_LD_PRELOAD_ON_WORKERS`: Default value is `0`, which means Ray doesn't preload Jemalloc for workers if a library is incompatible with Jemalloc. Set to `1` to instruct Ray to preload Jemalloc for a worker using values configured by `RAY_JEMALLOC_LIB_PATH` and `RAY_JEMALLOC_PROFILE`.
.. code-block:: bash
# Install jemalloc
wget https://github.com/jemalloc/jemalloc/releases/download/5.2.1/jemalloc-5.2.1.tar.bz2
tar -xf jemalloc-5.2.1.tar.bz2
cd jemalloc-5.2.1
export JEMALLOC_DIR=$PWD
./configure --enable-prof --enable-prof-libunwind
make
sudo make install
# Verify jeprof is installed.
which jeprof
# Start a Ray head node with jemalloc enabled.
# (1) `prof_prefix` defines the path to the output profile files and the prefix of their file names.
# (2) This example only profiles the GCS server component.
RAY_JEMALLOC_CONF=prof:true,lg_prof_interval:33,lg_prof_sample:17,prof_final:true,prof_leak:true,prof_prefix:$PATH_TO_OUTPUT_DIR/jeprof.out \
RAY_JEMALLOC_LIB_PATH=$JEMALLOC_DIR/lib/libjemalloc.so \
RAY_JEMALLOC_PROFILE=gcs_server \
ray start --head
# Check the output files. You should see files with the format of "jeprof.<pid>.0.f.heap".
# Example: jeprof.out.1904189.0.f.heap
ls $PATH_TO_OUTPUT_DIR/
# If you don't see any output files, try stopping the Ray cluster to force it to flush the
# profile data since `prof_final:true` is set.
ray stop
# Use jeprof to view the profile data. The first argument is the binary of GCS server.
# Note that you can also use `--pdf` or `--svg` to generate different formats of the profile data.
jeprof --text $YOUR_RAY_SRC_DIR/python/ray/core/src/ray/gcs/gcs_server $PATH_TO_OUTPUT_DIR/jeprof.out.1904189.0.f.heap
# [Example output]
Using local file ../ray/core/src/ray/gcs/gcs_server.
Using local file jeprof.out.1904189.0.f.heap.
addr2line: DWARF error: section .debug_info is larger than its filesize! (0x93f189 vs 0x530e70)
Total: 1.0 MB
0.3 25.9% 25.9% 0.3 25.9% absl::lts_20230802::container_internal::InitializeSlots
0.1 12.9% 38.7% 0.1 12.9% google::protobuf::DescriptorPool::Tables::CreateFlatAlloc
0.1 12.4% 51.1% 0.1 12.4% ::do_tcp_client_global_init
0.1 12.3% 63.4% 0.1 12.3% grpc_core::Server::Start
0.1 12.2% 75.6% 0.1 12.2% std::__cxx11::basic_string::_M_assign
0.1 12.2% 87.8% 0.1 12.2% std::__cxx11::basic_string::_M_mutate
0.1 12.2% 100.0% 0.1 12.2% std::__cxx11::basic_string::reserve
0.0 0.0% 100.0% 0.8 75.4% EventTracker::RecordExecution
...
Running microbenchmarks
-----------------------
To run a set of single-node Ray microbenchmarks, use:
.. code-block:: bash
ray microbenchmark
You can find the microbenchmark results for Ray releases in the `GitHub release logs <https://github.com/ray-project/ray/tree/master/release/release_logs>`__.
References
----------
- The `pprof documentation <http://goog-perftools.sourceforge.net/doc/cpu_profiler.html>`_.
- A `Go version of pprof <https://github.com/google/pprof>`_.
- The `gperftools <https://github.com/gperftools/gperftools>`_, including libprofiler, tcmalloc, and other useful tools.