Back to Catboost

Development and contributions

catboost/docs/en/concepts/development-and-contributions.md

1.2.1010.9 KB
Original Source

Development and contributions

Build from source {#build-from-source}

Run tests {#run-tests}

{% note warning %}

{% include ya-make-to-cmake-switch %}

{% endnote %}

CMake-based build tests

  • C/C++ libraries.

    C/C++ libraries contain tests for them in ut subdirectories in the source tree. For library in x/y/z the corresponding test code will be in x/y/z/ut and the target name will be x-y-z-ut. So, in order to run the test run CMake and then build the corresponding x-y-z-ut target. Building this target will produce an executable ${CMAKE_BUILD_DIR}/x/y/z/x-y-z-ut. Run this executable to execute all the tests.

  • {{ r-package }}

    1. Install additional R packages that are required to run tests:

      • caret
      • dplyr
      • jsonlite
      • testthat
    2. Open the R-package directory from the local copy of the {{ product }} repository.

    3. Run the following command:

      R CMD check .
      

    To run tests using the devtools package:

    1. Install devtools.

    2. Run the following command from the R session:

      devtools::test()
      
  • CLI

    1. Install pytest, pandas and catboost (used for reading column description files using catboost.utils.read_cd) packages for the python interpreter you intend to use. Optionally install pytest-xdist and pytest-randomly to run tests in parallel (it will be faster).

      {% cut "Previous requirements" %}

      Before revision 37a15c4, it was necessary to additionally install testpath package.

      {% endcut %}

    2. Build the CLI binary (target catboost for Ninja or another build tool) and a supplementary tool that is used to compare results generated as tests output with the canonical ones (target limited_precision_dsv_diff for Ninja or another build tool).

    3. Set the following environment variables:

      • CMAKE_SOURCE_DIR to the root of the local copy of the {{ product }} repository.
      • CMAKE_BINARY_DIR to the root for the build directory that has been generated by CMake and where the aformentioned targets have been built.
      • TEST_OUTPUT_DIR to the root for the directory where tests temporary data will be generated.
      • PORT_SYNC_PATH to the path to the directory that will be used for network ports allocation syncronization. The directory will be created if not exists.
      • HAVE_CUDA - set to 1 if you want to run tests on GPU with CUDA, set to 0 otherwise.
    4. Open the catboost/pytest directory from the local copy of the {{ product }} repository.

    5. Run python -m pytest or (if you use pytest-xdist) python -m pytest -n <parallel_worker_count> or python -m pytest -n auto (in the auto case the number of parallel workers will be equal to the total count of detected CPU cores).

  • Python package

    Tests will check catboost module for the python interpreter you run them with, so if you want to test catboost python package built from source build and install it first.

    1. Install pytest, pandas, ipywidgets, scikit-learn and polars packages for the python interpreter you intend to use. Optionally install pytest-xdist and pytest-randomly to run tests in parallel (it will be faster).

      {% cut "Previous requirements" %}

      Before revision 9017641, polars package had not been used.

      Before revision 34606a6 the supported scikit-learn versions were < 1.8.x.

      Before revision 37a15c4, it was necessary to additionally install testpath package.

      {% endcut %}

    2. Build supplementary tools that are used to compare results generated as tests output with the canonical ones (targets limited_precision_dsv_diff, limited_precision_json_diff, model_comparator for Ninja or another build tool).

    3. Set the following environment variables:

      • CMAKE_SOURCE_DIR to the root of the local copy of the {{ product }} repository.
      • CMAKE_BINARY_DIR to the root for the build directory that has been generated by CMake and where the aformentioned targets have been built.
      • TEST_OUTPUT_DIR to the root for the directory where tests temporary data will be generated.
      • PORT_SYNC_PATH to the path to the directory that will be used for network ports allocation syncronization. The directory will be created if not exists.
    4. Open the catboost/python-package/ut/medium directory from the local copy of the {{ product }} repository.

    5. Run python -m pytest or (if you use pytest-xdist) python -m pytest -n <parallel_worker_count> or python -m pytest -n auto (in the auto case the number of parallel workers will be equal to the total count of detected CPU cores).

    {% note warning %}

    Tests on GPU with CUDA will be run if and only if GPU with CUDA drivers installed is present.

    {% endnote %}

  • JVM applier

    Open the catboost/jvm-packages/catboost4j-prediction directory from the local copy of the {{ product }} repository. Run standard mvn test command. To run tests on GPU as well add -DtestOnGPU=1 command line flag.

  • CatBoost for Apache Spark

    See building CatBoost for Apache Spark from source. Use standard mvn test command.

YaMake-based build tests

{% note warning %}

The following documentation describes running tests using Ya Make which is applicable only for versions prior to this commit.

{% endnote %}

{{ product }} provides tests that check the compliance of the canonical data with the resulting data.

The required steps for running these tests depend on the implementation.

{% list tabs %}

  • Command-line version

    1. {% include test-common-tests %}

      1. Open the catboost/pytest directory from the local copy of the {{ product }} repository.

      2. Run the following command:

      bash
      ../../ya make -t -A [-Z]
      

      {% include test-replace-cannonical-files %}

    2. {% include test-gpu-specific-tests %}

      1. Open the catboost/pytest/cuda_tests directory from the local copy of the {{ product }} repository.

      2. Run the following command:

      bash
      ../../../ya make -DCUDA_ROOT=<path_to_CUDA_SDK> -t -A [-Z]
      

    {% include test-use-vcs-to-analyze-diff %}

  • {{ python-package }}

    1. {% include test-common-tests %}

      1. Open the catboost/python-package/ut/medium directory from the local copy of the {{ product }} repository.

      2. Run the following command:

      no-highlight
      ../../../../ya make -t -A [-Z]
      

      {% include test-replace-cannonical-files %}

    2. {% include test-gpu-specific-tests %}

      1. Open the catboost/python-package/ut/medium/gpu directory from the local copy of the {{ product }} repository.

      2. Run the following command:

      ../../../../../ya make -DCUDA_ROOT=<path_to_CUDA_SDK> -t -A [-Z]
      

    {% include test-use-vcs-to-analyze-diff %}

  • {{ r-package }}

    1. Install additional R packages that are required to run tests:

      • caret
      • dplyr
      • jsonlite
      • testthat
    2. Open the R-package directory from the local copy of the {{ product }} repository.

    3. Run the following command:

      R CMD check .
      

    To run tests using the devtools package:

    1. Install devtools.

    2. Run the following command from the R session:

      devtools::test()
      

{% endlist %}

Microsoft Visual Studio solution {#compiling-in-windows}

{% note warning %}

Ready Microsoft Visual Studio solution had been provided until this commit.

For versions after this commit it is recommended to generate Microsoft Visual Studio 2019 solution using the corresponding CMake generator.

{% endnote %}

A solution for Visual Studio is available in the {{ product }} repository:

catboost/msvs/arcadia.sln

Coding conventions {#coding-convention}

The following coding conventions must be followed in order to successfully contribute to the {{ product }} project:

Versioning conventions {#versioning-conventions}

Do not change the package version when submitting pull requests. Yandex uses an internal repository for this purpose.

License

By contributing to this project, you agree that your contributions will be licensed under the Apache 2.0 license.