3.0.3 Patch Release (Jul 30 2025) - Xgboost

################################# 3.0.3 Patch Release (Jul 30 2025) #################################

Fix NDCG metric with non-exp gain. (:pr:11534)
Avoid using mean intercept for rmsle. (:pr:11588)
[jvm-packages] add setNumEarlyStoppingRounds API (:pr:11571)
Avoid implicit synchronization in GPU evaluation. (:pr:11542)
Remove CUDA check in the array interface handler (:pr:11386)
Fix check in GPU histogram. (:pr:11574)
Support Rapids 25.06 (:pr:11504)
Adding enable_categorical to the sklearn .apply method (:pr:11550)
Make xgboost.testing compatible with scikit-learn 1.7 (:pr:11502)
Add support for building xgboost wheels on Win-ARM64 (:pr:11572, :pr:11597, :pr:11559)

################################# 3.0.2 Patch Release (May 25 2025) #################################

Dask 2025.4.0 scheduler info compatibility fix (:pr:11462)
Fix CUDA virtual memory fallback logic on WSL2 (:pr:11471)

################################# 3.0.1 Patch Release (May 13 2025) #################################

Use nvidia-smi to detect the driver version and handle old drivers that don't support virtual memory. (:pr:11391)
Optimize deep trees for GPU external memory. (:pr:11387)
Small fix for page concatenation with external memory (:pr:11338)
Build xgboost-cpu for manylinux_2_28_x86_64 (:pr:11406)
Workaround for different Dask versions (:pr:11436)
Output models now use denormal floating-point instead of nan. (:pr:11428)
Fix aarch64 CI. (:pr:11454)

################### 3.0.0 (2025 Feb 27) ###################

3.0.0 is a milestone for XGBoost. This note will summarize some general changes and then list package-specific updates. The bump in the major version is for a reworked R package along with a significant update to the JVM packages.

.. contents:: :backlinks: none :local:

External Memory Support

This release features a major update to the external memory implementation with improved performance, a new :py:class:~xgboost.ExtMemQuantileDMatrix for more efficient data initialization, new feature coverage including categorical data support and quantile regression support. Additionally, GPU-based external memory is reworked to support using CPU memory as a data cache. Last but not least, we worked on distributed training using external memory along with the spark package's initial support.

A new :py:class:~xgboost.ExtMemQuantileDMatrix class for fast data initialization with the hist tree method. The new class supports both CPU and GPU training. (:pr:10689, :pr:10682, :pr:10886, :pr:10860, :pr:10762, :pr:10694, :pr:10876)
External memory now supports distributed training (:pr:10492, :pr:10861). In addition, the Spark package can use external memory (the host memory) when the device is GPU. The default package on maven doesn't support RMM yet. For better performance, one needs to compile XGBoost from the source for now. (:pr:11186, :pr:11238, :pr:11219)
Improved performance with new optimizations for both the hist-specific training and the approx (:py:class:~xgboost.DMatrix) method. (:pr:10529, :pr:10980, :pr:10342)
New demos and documents for external memory, including distributed training. (:pr:11234, :pr:10929, :pr:10916, :pr:10426, :pr:11113)
Reduced binary cache size and memory allocation overhead by not writing the cut matrix. (:pr:10444)
More feature coverage, including categorical data and all objective functions, including quantile regression. In addition, various prediction types like SHAP values are supported. (:pr:10918, :pr:10820, :pr:10751, :pr:10724)

Significant updates for the GPU-based external memory training implementation. (:pr:10924, :pr:10895, :pr:10766, :pr:10544, :pr:10677, :pr:10615, :pr:10927, :pr:10608, :pr:10711)

GPU-based external memory supports both batch-based and sampling-based training. Before the 3.0 release, XGBoost concatenates the data during training and stores the cache on disk. In 3.0, XGBoost can now stage the data on the host and fetch them by batch. (:pr:10602, :pr:10595, :pr:10606, :pr:10549, :pr:10488, :pr:10766, :pr:10765, :pr:10764, :pr:10760, :pr:10753, :pr:10734, :pr:10691, :pr:10713, :pr:10826, :pr:10811, :pr:10810, :pr:10736, :pr:10538, :pr:11333)
XGBoost can now utilize NVLink-C2C for GPU-based external memory training and can handle up to terabytes of data.
Support prediction cache (:pr:10707).
Automatic page concatenation for improved GPU utilization (:pr:10887).
Improved quantile sketching algorithm for batch-based inputs. See the section for :ref:new features <3_0_features> for more info.
Optimization for nearly-dense input, see the section for :ref:optimization <3_0_optimization> for more info.

See our latest document for details :doc:/tutorials/external_memory. The PyPI package (pip install) doesn't have RMM support, which is required by the GPU external memory implementation. To experiment, you can compile XGBoost from source or wait for the RAPIDS conda package to be available.

.. _3_0_networking:

Networking

Continuing the work from the previous release, we updated the network module to improve reliability. (:pr:10453, :pr:10756, :pr:11111, :pr:10914, :pr:10828, :pr:10735, :pr:10693, :pr:10676, :pr:10349, :pr:10397, :pr:10566, :pr:10526, :pr:10349)

The timeout option is now supported for NCCL using the NCCL asynchronous mode (:pr:10850, :pr:10934, :pr:10945, :pr:10930).

In addition, a new :py:class:~xgboost.collective.Config class is added for users to specify various options including timeout, tracker port, etc for distributed training. Both the Dask interface and the PySpark interface support the new configuration. (:pr:11003, :pr:10281, :pr:10983, :pr:10973)

SYCL

Continuing the work on the SYCL integration, there are significant improvements in the feature coverage for this release from more training parameters and more objectives to distributed training, along with various optimization (:pr:10884, :pr:10883).

Starting with 3.0, the SYCL-plugin is close to feature-complete, users can start working on SYCL devices for in-core training and inference. Newly introduced features include:

Dask support for distributed training (:pr:10812)
Various training procedures, including split evaluation (:pr:10605, :pr:10636), grow policy (:pr:10690, :pr:10681), cached prediction (:pr:10701).
Updates for objective functions. (:pr:11029, :pr:10931, :pr:11016, :pr:10993, :pr:11064, :pr:10325)
On-going work for float32-only devices. (:pr:10702)

Other related PRs (:pr:10842, :pr:10543, :pr:10806, :pr:10943, :pr:10987, :pr:10548, :pr:10922, :pr:10898, :pr:10576)

.. _3_0_features:

Features

This section describes new features in the XGBoost core. For language-specific features, please visit corresponding sections.

A new initialization method for objectives that are derived from GLM. The new method is based on the mean value of the input labels. The new method changes the result of the estimated base_score. (:pr:10298, :pr:11331)
The :py:class:xgboost.QuantileDMatrix can be used with all prediction types for both CPU and GPU.
In prior releases, XGBoost makes a copy for the booster to release memory held by internal tree methods. We formalize the procedure into a new booster method :py:meth:~xgboost.Booster.reset / :cpp:func:XGBoosterReset. (:pr:11042)
OpenMP thread setting is exposed to the XGBoost global configuration. Users can use it to workaround hardcoded OpenMP environment variables. (:pr:11175)
We improved learning to rank tasks for better hyper-parameter configuration and for distributed training.
- In 3.0, all three distributed interfaces, including Dask, Spark, and PySpark, support sorting the data based on query ID. The option for the :py:class:~xgboost.dask.DaskXGBRanker is true by default and can be opted out. (:pr:11146, :pr:11007, :pr:11047, :pr:11012, :pr:10823, :pr:11023)
- Also for learning to rank, a new parameter lambdarank_score_normalization is introduced to make one of the normalizations optional. (:pr:11272)
- The lambdarank_normalization now uses the number of pairs when normalizing the mean pair strategy. Previously, the gradient was used for both topk and mean. :pr:11322
We have improved GPU quantile sketching to reduce memory usage. The improvement helps the construction of the :py:class:~xgboost.QuantileDMatrix and the new :py:class:~xgboost.ExtMemQuantileDMatrix.
- A new multi-level sketching algorithm is employed to reduce the overall memory usage with batched inputs.
- In addition to algorithmic changes, internal memory usage estimation and the quantile container is also updated. (:pr:10761, :pr:10843)
- The change introduces two more parameters for the :py:class:~xgboost.QuantileDMatrix and :py:class:~xgboost.DataIter, namely, max_quantile_batches and min_cache_page_bytes.
More work is needed to improve the support of categorical features. This release supports plotting trees with stat for categorical nodes (:pr:11053). In addition, some preparation work is ongoing for auto re-coding categories. (:pr:11094, :pr:11114, :pr:11089) These are feature enhancements instead of blocking issues.
Implement weight-based feature importance for vector-leaf. (:pr:10700)
Reduced logging in the DMatrix construction. (:pr:11080)

.. _3_0_optimization:

Optimization

In addition to the external memory and quantile sketching improvements, we have a number of optimizations and performance fixes.

GPU tree methods now use significantly less memory for both dense inputs and near-dense inputs. (:pr:10821, :pr:10870)
For near-dense inputs, GPU training is much faster for both hist (about 2x) and approx.
Quantile regression on CPU now can handle imbalance trees much more efficiently. (:pr:11275)
Small optimization for DMatrix construction to reduce latency. Also, C users can now reuse the :cpp:func:ProxyDMatrix <XGProxyDMatrixCreate()> for multiple inference calls. (:pr:11273)
CPU prediction performance for :py:class:~xgboost.QuantileDMatrix has been improved (:pr:11139) and now is on par with normal DMatrix.
Fixed a performance issue for running inference using CPU with extremely sparse :py:class:~xgboost.QuantileDMatrix (:pr:11250).
Optimize CPU training memory allocation for improved performance. (:pr:11112)
Improved RMM (rapids memory manager) integration. Now, with the help of :py:func:~xgboost.config_context, all memory allocated by XGBoost should be routed to RMM. As a bonus, all thrust algorithms now use async policy. (:pr:10873, :pr:11173, :pr:10712, :pr:10712, :pr:10562)
When used without RMM, XGBoost is more careful with its use of caching allocator to avoid holding too much device memory. (:pr:10582)

Breaking Changes

This section lists breaking changes that affect all packages.

Remove the deprecated DeviceQuantileDMatrix. (:pr:10974, :pr:10491)
Support for saving the model in the deprecated has been removed. Users can still load old models in 3.0. (:pr:10490)
Support for the legacy (blocking) CUDA stream is removed (:pr:10607)
XGBoost now requires CUDA 12.0 or later.

Bug Fixes

Fix the quantile error metric (pinball loss) with multiple quantiles. (:pr:11279)
Fix potential access error when running prediction in multi-thread environment. (:pr:11167)
Check the correct dump format for the gblinear. (:pr:10831)

Documentation

A new tutorial for advanced usage with custom objective functions. (:pr:10283, :pr:10725)
The new online document site now shows documents for all packages including Python, R, and JVM-based packages. (:pr:11240, :pr:11216, :pr:11166)
Lots of enhancements. (:pr:10822, 11137, :pr:11138, :pr:11246, :pr:11266, :pr:11253, :pr:10731, :pr:11222, :pr:10551, :pr:10533)
Consistent use of cmake in documents. (:pr:10717)
Add a brief description for using the offset from the GLM setting (like Poisson). (:pr:10996)
Cleanup document for building from source. (:pr:11145)
Various fixes. (:pr:10412, :pr:10405, :pr:10353, :pr:10464, :pr:10587, :pr:10350, :pr:11131, :pr:10815)
Maintenance. (:pr:11052, :pr:10380)

Python Package

The feature_weights parameter in the sklearn interface is now defined as a scikit-learn parameter. (:pr:9506)
Initial support for polars, categorical feature is not yet supported. (:pr:11126, :pr:11172, :pr:11116)
Reduce pandas dataframe overhead and overhead for various imports. (:pr:11058, :pr:11068)
Better xlabel in :py:func:~xgboost.plot_importance (:pr:11009)
Validate reference dataset for training. The :py:func:~xgboost.train function now throws an error if a :py:class:~xgboost.QuantileDMatrix is used as a validation dataset without a reference. (:pr:11105)
Fix misleading errors when feature names are missing during inference (:pr:10814)
Add Stacklevel to Python warning callback. The change helps improve the error message for the Python package. (:pr:10977)
Remove circular reference in DataIter. It helps reduce memory usage. (:pr:11177)
Add checks for invalid inputs for cv. (:pr:11255)
Update Python project classifiers. (:pr:10381, :pr:11028)
Support doc link for the sklearn module. Users can now find links to documents in a jupyter notebook. (:pr:10287)
Dask
- Prevent the training from hanging due to aborted workers. (:pr:10985) This helps Dask XGBoost be robust against error. When a worker is killed, the training will fail with an exception instead of hang.
- Optional support for client-side logging. (:pr:10942)
- Fix LTR with empty partition and NCCL error. (:pr:11152)
- Update to work with the latest Dask. (:pr:11291)
- See the :ref:3_0_features section for changes to ranking models.
- See the :ref:3_0_networking section for changes with the communication module.
PySpark
- Expose Training and Validation Metrics. (:pr:11133)
- Add barrier before initializing the communicator. (:pr:10938)
- Extend support for columnar input to CPU (GPU-only previously). (:pr:11299)
- See the :ref:3_0_features section for changes to ranking models.
- See the :ref:3_0_networking section for changes with the communication module.
Document updates (:pr:11265).
Maintenance. (:pr:11071, :pr:11211, :pr:10837, :pr:10754, :pr:10347, :pr:10678, :pr:11002, :pr:10692, :pr:11006, :pr:10972, :pr:10907, :pr:10659, :pr:10358, :pr:11149, :pr:11178, :pr:11248)
Breaking changes
- Remove deprecated feval. (:pr:11051)
- Remove dask from the default import. (:pr:10935) Users are now required to import the XGBoost Dask through:
  
  .. code-block:: python
  
  from xgboost import dask as dxgb
  
  instead of:
  
  .. code-block:: python
  
  import xgboost as xgb xgb.dask
  
  The change helps avoid introducing dask into the default import set.
- Bump Python requirement to 3.10. (:pr:10434)
- Drop support for datatable. (:pr:11070)

R Package

We have been reworking the R package for a few releases now. In 3.0, we will start publishing a new R package on R-universe, before moving toward a CRAN update. The new package features a much more ergonomic interface, which is also more idiomatic to R speakers. In addition, a range of new features are introduced to the package. To name a few, the new package includes categorical feature support, QuantileDMatrix, and an initial implementation of the external memory training. To test the new package:

.. code-block:: R

install.packages('xgboost', repos = c('https://dmlc.r-universe.dev', 'https://cloud.r-project.org'))

Also, we finally have an online documentation site for the R package featuring both vignettes and API references (:pr:11166, :pr:11257). A good starting point for the new interface is the new xgboost() function. We won't list all the feature gains here, as there are too many! Please visit the :doc:/R-package/index for more info. There's a migration guide (:pr:11197) there if you use a previous XGBoost R package version.

Support for the MSVC build was dropped due to incompatibility with R headers. (:pr:10355, :pr:11150)
Maintenance (:pr:11259)
Related PRs. (:pr:11171, :pr:11231, :pr:11223, :pr:11073, :pr:11224, :pr:11076, :pr:11084, :pr:11081, :pr:11072, :pr:11170, :pr:11123, :pr:11168, :pr:11264, :pr:11140, :pr:11117, :pr:11104, :pr:11095, :pr:11125, :pr:11124, :pr:11122, :pr:11108, :pr:11102, :pr:11101, :pr:11100, :pr:11077, :pr:11099, :pr:11074, :pr:11065, :pr:11092, :pr:11090, :pr:11096, :pr:11148, :pr:11151, :pr:11159, :pr:11204, :pr:11254, :pr:11109, :pr:11141, :pr:10798, :pr:10743, :pr:10849, :pr:10747, :pr:11022, :pr:10989, :pr:11026, :pr:11060, :pr:11059, :pr:11041, :pr:11043, :pr:11025, :pr:10674, :pr:10727, :pr:10745, :pr:10733, :pr:10750, :pr:10749, :pr:10744, :pr:10794, :pr:10330, :pr:10698, :pr:10687, :pr:10688, :pr:10654, :pr:10456, :pr:10556, :pr:10465, :pr:10337)

JVM Packages

The XGBoost 3.0 release features a significant update to the JVM packages, and in particular, the Spark package. There are breaking changes in packaging and some parameters. Please visit the :doc:migration guide </jvm/xgboost_spark_migration> for related changes. The work brings new features and a more unified feature set between CPU and GPU implementation. (:pr:10639, :pr:10833, :pr:10845, :pr:10847, :pr:10635, :pr:10630, :pr:11179, :pr:11184)

Automatic partitioning for distributed learning to rank. See the :ref:features <3_0_features> section above (:pr:11023).
Resolve spark compatibility issue (:pr:10917)
Support missing value when constructing dmatrix with iterator (:pr:10628)
Fix transform performance issue (:pr:10925)
Honor skip.native.build option in xgboost4j-gpu (:pr:10496)
Support array features type for CPU (:pr:10937)
Change default missing value to NaN for better alignment (:pr:11225)
Don't cast to float if it's already float (:pr:10386)
Maintenance. (:pr:10982, :pr:10979, :pr:10978, :pr:10673, :pr:10660, :pr:10835, :pr:10836, :pr:10857, :pr:10618, :pr:10627)

Maintenance

Code maintenance includes both refactoring (:pr:10531, :pr:10573, :pr:11069), cleanups (:pr:11129, :pr:10878, :pr:11244, :pr:10401, :pr:10502, :pr:11107, :pr:11097, :pr:11130, :pr:10758, :pr:10923, :pr:10541, :pr:10990), and improvements for tests (:pr:10611, :pr:10658, :pr:10583, :pr:11245, :pr:10708), along with fixing various warnings in compilers and test dependencies (:pr:10757, :pr:10641, :pr:11062, :pr:11226). Also, miscellaneous updates, including some dev scripts and profiling annotations (:pr:10485, :pr:10657, :pr:10854, :pr:10718, :pr:11158, :pr:10697, :pr:11276).

Lastly, dependency updates (:pr:10362, :pr:10363, :pr:10360, :pr:10373, :pr:10377, :pr:10368, :pr:10369, :pr:10366, :pr:11032, :pr:11037, :pr:11036, :pr:11035, :pr:11034, :pr:10518, :pr:10536, :pr:10586, :pr:10585, :pr:10458, :pr:10547, :pr:10429, :pr:10517, :pr:10497, :pr:10588, :pr:10975, :pr:10971, :pr:10970, :pr:10949, :pr:10947, :pr:10863, :pr:10953, :pr:10954, :pr:10951, :pr:10590, :pr:10600, :pr:10599, :pr:10535, :pr:10516, :pr:10786, :pr:10859, :pr:10785, :pr:10779, :pr:10790, :pr:10777, :pr:10855, :pr:10848, :pr:10778, :pr:10772, :pr:10771, :pr:10862, :pr:10952, :pr:10768, :pr:10770, :pr:10769, :pr:10664, :pr:10663, :pr:10892, :pr:10979, :pr:10978).

The CI is reworked to use RunsOn to integrate custom CI pipelines with GitHub action. The migration helps us reduce the maintenance burden and make the CI configuration more accessible to others. (:pr:11001, :pr:11079, :pr:10649, :pr:11196, :pr:11055, :pr:10483, :pr:11078, :pr:11157)
Other maintenance work includes various small fixes, enhancements, and tooling updates. (:pr:10877, :pr:10494, :pr:10351, :pr:10609, :pr:11192, :pr:11188, :pr:11142, :pr:10730, :pr:11066, :pr:11063, :pr:10800, :pr:10995, :pr:10858, :pr:10685, :pr:10593, :pr:11061)