doc/source/data/contributing/contributing-guide.md
If you want your changes to be reviewed and merged quickly, following a few key practices makes a big difference. Clear, focused, and well-structured contributions help reviewers understand your intent and ensure your improvements land smoothly.
:::{seealso}
This guide covers contributing to Ray Data in specific. For information on contributing
to the Ray project in general, see the {ref}general Ray contributing guide<getting-involved>.
:::
Start by solving a problem you encounter, like fixing a bug or adding a missing feature. If you're unsure where to start:
If you’re adding a new public API or making a substantial refactor, share your plan early. Discussing changes before you invest a lot of work can save time and align your work with the project’s direction.
You can open a draft PR, discuss on an Issue, or post in Slack for early feedback. It won’t affect acceptance and often improves the final design.
Most changes to Ray Data require tests. For tips on how to write good tests, see
{ref}How to write tests <how-to-write-tests>.
Ray Data values readable, maintainable, and extendable code over clever tricks. For guidance on how to write code that aligns with Ray Data's design taste, see A Philosophy of Software Design.
To test your changes locally, build Ray from source. For Ray Data development, you typically only need the Python environment—you can skip the C++ build unless you’re also contributing to Ray Core.
Before submitting a PR, run pre-commit to lint your changes and pytest to execute your tests.
Note that the full Ray Data test suite can be heavy to run locally, start with tests directly related to your changes. For example, if you modified map, from python/ray/data/tests run: pytest test_map.py.
Explain why the change exists and what it achieves. Clear descriptions reduce back-and-forth and speed up reviews.
Here's an example of a PR with a good description: [Data] Refactor PhysicalOperator.completed to fix side effects .
Review difficulty scales non-linearly with PR size.
For fast reviews, do the following:
Here's an example of a PR that keeps its scope small: [Data] Support Non-String Items for ApproximateTopK Aggregator. While the broader effort focuses on optimizing preprocessors, this change was deliberately split out as a small, incremental PR, which made it much easier to review.
Ray's CI runs lint and a small set of tests first in the buildkite/microcheck check.
Start by making that pass.
Once it’s green, tag your reviewer. They can add the go label to trigger the full test suite.