doc/changes/v2.1.0.rst
################################ 2.1.4 Patch Release (2025 Feb 6) ################################
The 2.1.4 patch release incorporates the following fixes on top of the 2.1.3 release:
################################# 2.1.3 Patch Release (2024 Nov 26) #################################
The 2.1.3 patch release makes the following bug fixes:
cudf.pandas proxy objects properly (#11014).################################# 2.1.2 Patch Release (2024 Oct 23) #################################
The 2.1.2 patch release makes the following bug fixes:
################################# 2.1.1 Patch Release (2024 Jul 31) #################################
The 2.1.1 patch release makes the following bug fixes:
In addition, it contains several enhancements:
################### 2.1.0 (2024 Jun 20) ###################
We are thrilled to announce the XGBoost 2.1 release. This note will start by summarizing some general changes and then highlighting specific package updates. As we are working on a new R interface <https://github.com/dmlc/xgboost/issues/9810>_, this release will not include the R package. We'll update the R package as soon as it's ready. Stay tuned!
.. contents:: :backlinks: none :local:
Networking Improvements
An important ongoing work for XGBoost, which we've been collaborating on, is to support resilience for improved scaling and federated learning on various platforms. The existing networking library in XGBoost, adopted from the RABIT project, can no longer meet the feature demand. We've revamped the RABIT module in this release to pave the way for future development. The choice of using an in-house version instead of an existing library is due to the active development status with frequent new feature requests like loading extra plugins for federated learning. The new implementation features:
Related PRs (#9597, #9576, #9523, #9524, #9593, #9596, #9661, #10319, #10152, #10125, #10332, #10306, #10208, #10203, #10199, #9784, #9777, #9773, #9772, #9759, #9745, #9695, #9738, #9732, #9726, #9688, #9681, #9679, #9659, #9650, #9644, #9649, #9917, #9990, #10313, #10315, #10112, #9531, #10075, #9805, #10198, #10414).
The existing option of using MPI in RABIT is removed in the release. (#9525)
NCCL is now fetched from PyPI
In the previous version, XGBoost statically linked NCCL, which significantly increased the binary size and led to hitting the PyPI repository limit. With the new release, we have made a significant improvement. The new release can now dynamically load NCCL from an external source, reducing the binary size. For the PyPI package, the nvidia-nccl-cu12 package will be fetched during installation. With more downstream packages reusing NCCL, we expect the user environments to be slimmer in the future as well. (#9796, #9804, #10447)
Parts of the Python package now require glibc 2.28+
Starting from 2.1.0, XGBoost Python package will be distributed in two variants:
manylinux_2_28: for recent Linux distros with glibc 2.28 or newer. This variant comes with all features enabled.manylinux2014: for old Linux distros with glibc older than 2.28. This variant does not support GPU algorithms or federated learning.The pip package manager will automatically choose the correct variant depending on your system.
Starting from May 31, 2025, we will stop distributing the manylinux2014 variant and exclusively
distribute the manylinux_2_28 variant. We made this decision so that our CI/CD pipeline won't have
depend on software components that reached end-of-life (such as CentOS 7). We strongly encourage
everyone to migrate to recent Linux distros in order to use future versions of XGBoost.
Note. If you want to use GPU algorithms or federated learning on an older Linux distro, you have two alternatives:
Multi-output
We continue the work on multi-target and vector leaf in this release:
XGBoosterTrainOneIter. This new function supports strided matrices and CUDA inputs. In addition, custom objectives now return the correct shape for prediction. (#9508)hinge objective now supports multi-target regression (#9850)Please note that the feature is still in progress and not suitable for production use.
Federated Learning
Progress has been made on federated learning with improved support for column-split, including the following updates:
Ongoing work for SYCL support
XGBoost is developing a SYCL plugin for SYCL devices, starting with the hist tree method. (#10216, #9800, #10311, #9691, #10269, #10251, #10222, #10174, #10080, #10057, #10011, #10138, #10119, #10045, #9876, #9846, #9682) XGBoost now supports launchable inference on SYCL devices, and that work on adding SYCL support for training is ongoing.
Looking ahead, we plan to complete the training in coming releases and then focus on improving test coverage for SYCL, particularly for Python tests.
Optimizations
Deprecation and breaking changes
Package-specific breaking changes are outlined in respective sections. Here we list general breaking changes in this release:
Universal binary JSON is now the default format for saving models (#9947, #9958, #9954, #9955). See https://github.com/dmlc/xgboost/issues/7547 for more info.XGBoosterGetModelRaw is now removed after deprecation in 1.6. (#9617)XGDMatrixSetDenseInfo and XGDMatrixSetUIntInfo are now deprecated. Use the array interface based alternatives instead.Features
This section lists some new features that are general to all language bindings. For package-specific changes, please visit respective sections.
deviance. (#9757)lambdarank_normalization parameter. (#10094)QuantileDMatrix on CPU. (#10043)Bug fixes
FieldEntry constructor specialization syntax error (#9980)lambdarank_pair_method. (#10098)gblinear from treating categorical features as numerical. (#9946)Document
Here is a list of documentation changes not specific to any XGBoost package.
base_score. (#9882)Python package
Other than the changes in networking, we have some optimizations and document updates in dask:
from xgboost import dask instead of import xgboost.dask to avoid drawing in unnecessary dependencies for non-dask users. (#9742)PySpark has several new features along with some small fixes:
verbosity=3. (#10172)For the Python package, eval_metric, early_stopping_rounds, and callbacks from now removed from the fit method in the sklearn interface. They were deprecated in 1.6. Use the parameters with the same name in constructors instead. (#9986)
Following is a list of new features in the Python package:
cudf.pandas (#9602), torch.Tensor (#9971), and more scipy types (#9881).random_state (#9743)DMatrix with None input. (#10052)enable_categorical (#9877, #9884)JVM packages
Here is a list of JVM-specific changes. Like the PySpark package, the JVM package also gains stage-level scheduling.
Maintenance
** CI **