doc/R-package/migration_guide.rst
.. _migation_guide:
XGBoost's R language bindings had large breaking changes between versions 1.x and 2.x. R code that was working with past XGBoost versions might require modifications to work with the newer versions. This guide outlines the main differences:
Function xgboost():
xgboost() encoded in the format used by the XGBoost core library - meaning: binary variables had to be encoded to 0/1, bounds for survival objectives had to be passed as different arguments, among others. In the newest versions, 'y' now doesn't need to be manually encoded beforehand: it should be passed as an R object of the corresponding class as regression functions from base R and core R packages for the corresponding XGBoost objective - e.g. classification problems should be passed a factor, survival problems a Surv, regression problems a numeric vector, and so on. Learning-to-rank is not supported by xgboost(), but is supported by xgb.train.xgboost() accepted both a params argument and named arguments under .... Now all training parameters should be passed as named arguments, and all accepted parameters are explicit function arguments with in-package documentation. Some parameters are not allowed as they are determined automatically from the rest of the data, such as the number of classes for multi-classes classification which is determined automatically from 'y'. As well, parameters that have synonyms or which are accepted under different possible arguments (e.g. "eta" and "learning_rate") now accept only their more descriptive form (so "eta" is not accepted, but "learning_rate" is).xgboost() are now returned with a different class "xgboost", which is a subclass of "xgb.Booster" but with more metadata and a predict method with different defaults.xgboost() is now meant for interactive usage only. For package developers who wish to incorporate the XGBoost package, it is highly recommended to use xgb.train instead, which is a lower-level function that closely mimics the same function from the Python package and is meant to be less subject to breaking changes.Function xgb.train():
xgb.train() allowed arguments under both a "params" list and as named arguments under .... Now, all training arguments should be passed under params.xgb.params which can generate a list to pass to the params argument. xgb.params is simply a function with named arguments that lists everything accepted by xgb.train and offers in-package documentation for all of the arguments, returning a simple named list.xgb.DMatrix instead (e.g. argument for categorical features or for feature names).xgb.train has largely been re-written. See the documentation of xgb.Callback for details.Function xgb.DMatrix():
Function xgb.cv():
xgboost(), now it accepts only xgb.DMatrix objects.Method predict:
xgboost() or through xgb.train(). Function xgboost() is more geared towards interactive usage, and thus the defaults for the 'predict' method on such objects (class "xgboost") by default will perform more data validations such as checking that column names match and reordering them otherwise. The 'predict' method for models created through xgb.train() (class "xgb.Booster") has the same defaults as before, so for example it will not reorder columns to match names under the default behavior.xgboost(), not by xgb.train()) now can control the types of predictions to make through an argument type, similarly as the 'predict' methods in the 'stats' module of base R - e.g. one can now do predict(model, type="class"); while the 'predict' method for "xgb.Booster" objects (produced by xgb.train()), just like before, controls those through separate arguments such as outputmargin.seq function. Note that the syntax for "use all trees" and "use trees up to early-stopped criteria" have changed (see documentation for details).Booster objects:
The structure of these objects has been modified - now they are represented as a simple R "ALTLIST" (a special kind of 'list' object) with additional attributes.
These objects now cannot be modified by adding more fields to them, but metadata for them can be added as attributes.
The objects distinguish between two types of attributes:
attributes(model) and attributes(model)$field <- val), which allow arbitrary objects. Many attributes are automatically added by the model building functions, such as evaluation logs (a data.table with metrics calculated per iteration), which previously were model fields.xgb.attributes(model). These C-level attributes are shareable through serialized models in different XGBoost interfaces, while the R-level ones are specific to the R interface. Some attributes that are standard among language bindings of XGBoost, such as the best interation, are kept as C attributes.Previously, models that were just de-serialized from an on-disk format required calling method 'xgb.Booster.complete' on them to finish the full de-serialization process before being usable, or would otherwise call this method on their own automatically automatically at the first call to 'predict'. Serialization is now handled more gracefully, and there are no additional functions/methods involved - i.e. if one saves a model to disk with saveRDS() and then reads it back with readRDS(), the model will be fully loaded straight away, without needing to call additional methods on it.
By default, XGBoost might recognize that some parameter has been removed or renamed from how it was in a previous version, and still accept the same function call as it used to do before with the renamed or removed arugments, but issuing a deprecation warning along the way that highlights the changes.
These behaviors will be removed in future versions, and function calls which currently return deprecation warnings will stop working in the future, so in order to make sure that code calling XGBoost will still keep working, it should be ensured that it doesn't issue deprecation warnings.
Optionally, these deprecation warnings can be turned into errors (while still keeping other types of warnings as warnings) through an option "xgboost.strict_mode" - example:
.. code-block:: r
options("xgboost.strict_mode" = TRUE)
It can also be controlled through an environment variable XGB_STRICT_MODE=1, which takes precende over the R option - e.g.:
.. code-block:: r
Sys.setenv("XGB_STRICT_MODE" = "1")
It is highly recommended for package developers to enable this option during their package checks to ensure better compatibility with XGBoost.