Intercept - Xgboost — ContextQMD

######### Intercept #########

.. versionadded:: 2.0.0

Since 2.0.0, XGBoost supports estimating the model intercept (named base_score) automatically based on targets upon training. The behavior can be controlled by setting base_score to a constant value. The following snippet disables the automatic estimation:

.. tabs:: .. code-tab:: py

    import xgboost as xgb

    clf = xgb.XGBClassifier(n_estimators=10)
    clf.set_params(base_score=0.5)

.. code-tab:: r R

    library(xgboost)

    # Load built-in dataset
    data(agaricus.train, package = "xgboost")

    # Set base_score parameter directly
    model <- xgboost(
      x = agaricus.train$data,
      y = factor(agaricus.train$label),
      base_score = 0.5,
      nrounds = 10
    )

In addition, here 0.5 represents the value after applying the inverse link function. See the end of the document for a description.

Other than the base_score, users can also provide global bias via the data field base_margin, which is a vector or a matrix depending on the task. With multi-output and multi-class, the base_margin is a matrix with size (n_samples, n_targets) or (n_samples, n_classes).

.. tabs:: .. code-tab:: py

    import xgboost as xgb
    from sklearn.datasets import make_classification

    X, y = make_classification()

    clf = xgb.XGBClassifier()
    clf.fit(X, y)
    # Request for raw prediction
    m = clf.predict(X, output_margin=True)

    clf_1 = xgb.XGBClassifier()
    # Feed the prediction into the next model
    # Using base margin overrides the base score, see below sections.
    clf_1.fit(X, y, base_margin=m)
    clf_1.predict(X, base_margin=m)

.. code-tab:: r R

    library(xgboost)

    # Load built-in dataset
    data(agaricus.train, package = "xgboost")

    # Train first model
    model_1 <- xgboost(
      x = agaricus.train$data,
      y = factor(agaricus.train$label),
      nrounds = 10
    )

    # Request for raw prediction
    m <- predict(model_1, agaricus.train$data, type = "raw")

    # Feed the prediction into the next model using base_margin
    # Using base margin overrides the base score, see below sections.
    model_2 <- xgboost(
      x = agaricus.train$data,
      y = factor(agaricus.train$label),
      base_margin = m,
      nrounds = 10
    )

    # Make predictions with base_margin
    pred <- predict(model_2, agaricus.train$data, base_margin = m)

It specifies the bias for each sample and can be used for stacking an XGBoost model on top of other models, see :ref:sphx_glr_python_examples_boost_from_prediction.py for a worked example. When base_margin is specified, it automatically overrides the base_score parameter. If you are stacking XGBoost models, then the usage should be relatively straightforward, with the previous model providing raw prediction and a new model using the prediction as bias. For more customized inputs, users need to take extra care of the link function. Let :math:F be the model and :math:g be the link function, since base_score is overridden when sample-specific base_margin is available, we will omit it here:

.. math::

g(E[y_i]) = F(x_i)

When base margin :math:b is provided, it's added to the raw model output :math:F:

.. math::

g(E[y_i]) = F(x_i) + b_i

and the output of the final model is:

.. math::

g^{-1}(F(x_i) + b_i)

Using the gamma deviance objective reg:gamma as an example, which has a log link function, hence:

.. math::

\ln{(E[y_i])} = F(x_i) + b_i \ E[y_i] = \exp{(F(x_i) + b_i)}

As a result, if you are feeding outputs from models like GLM with a corresponding objective function, make sure the outputs are not yet transformed by the inverse link (activation).

In the case of base_score (intercept), it can be accessed through :py:meth:~xgboost.Booster.save_config after estimation. Unlike the base_margin, the returned value represents a value after applying inverse link. With logistic regression and the logit link function as an example, given the base_score as 0.5, :math:g(intercept) = logit(0.5) = 0 is added to the raw model output:

.. math::

E[y_i] = g^{-1}{(F(x_i) + g(intercept))}

and 0.5 is the same as :math:base\_score = g^{-1}(0) = 0.5. This is more intuitive if you remove the model and consider only the intercept, which is estimated before the model is fitted:

.. math::

E[y] = g^{-1}{(g(intercept))} \ E[y] = intercept

For some objectives like MAE, there are close solutions, while for others it's estimated with one step Newton method.

Offset

The base_margin is a form of offset in GLM. Using the Poisson objective as an example, we might want to model the rate instead of the count:

.. math::

rate = \frac{count}{exposure}

And the offset is defined as log link applied to the exposure variable: :math:\ln{exposure}. Let :math:c be the count and :math:\gamma be the exposure, substituting the response :math:y in our previous formulation of base margin:

.. math::

g(\frac{E[c_i]}{\gamma_i}) = F(x_i)

Substitute :math:g with :math:\ln for Poisson regression:

.. math::

\ln{\frac{E[c_i]}{\gamma_i}} = F(x_i)

We have:

.. math::

E[c_i] &= \exp{(F(x_i) + \ln{\gamma_i})} \ E[c_i] &= g^{-1}(F(x_i) + g(\gamma_i))

As you can see, we can use the base_margin for modeling with offset similar to GLMs

Example

The following example shows the relationship between base_score and base_margin using binary logistic with a logit link function:

.. tabs:: .. code-tab:: py

    import numpy as np
    from scipy.special import logit
    from sklearn.datasets import make_classification

    import xgboost as xgb

    X, y = make_classification(random_state=2025)

.. code-tab:: r R

    library(xgboost)

    # Load built-in dataset
    data(agaricus.train, package = "xgboost")
    X <- agaricus.train$data
    y <- agaricus.train$label

The intercept is a valid probability (0.5). It's used as the initial estimation of the probability of obtaining a positive sample.

.. tabs:: .. code-tab:: py

    intercept = 0.5

.. code-tab:: r R

    intercept <- 0.5

First we use the intercept to train a model:

.. tabs:: .. code-tab:: py

    booster = xgb.train(
        {"base_score": intercept, "objective": "binary:logistic"},
        dtrain=xgb.DMatrix(X, y),
        num_boost_round=1,
    )
    predt_0 = booster.predict(xgb.DMatrix(X, y))

.. code-tab:: r R

    # First model with base_score
    model_0 <- xgboost(
      x = X, y = factor(y),
      base_score = intercept,
      objective = "binary:logistic",
      nrounds = 1
    )
    predt_0 <- predict(model_0, X)

Apply :py:func:~scipy.special.logit to obtain the "margin":

.. tabs:: .. code-tab:: py

    # Apply logit function to obtain the "margin"
    margin = np.full(y.shape, fill_value=logit(intercept), dtype=np.float32)
    Xy = xgb.DMatrix(X, y, base_margin=margin)
    # Second model with base_margin
    # 0.2 is a dummy value to show that `base_margin` overrides `base_score`.
    booster = xgb.train(
        {"base_score": 0.2, "objective": "binary:logistic"},
        dtrain=Xy,
        num_boost_round=1,
    )
    predt_1 = booster.predict(Xy)

.. code-tab:: r R

    # Apply logit function to obtain the "margin"
    logit_intercept <- log(intercept / (1 - intercept))
    margin <- rep(logit_intercept, length(y))
    # Second model with base_margin
    # 0.2 is a dummy value to show that `base_margin` overrides `base_score`
    model_1 <- xgboost(
      x = X, y = factor(y),
      base_margin = margin,
      base_score = 0.2,
      objective = "binary:logistic",
      nrounds = 1
    )
    predt_1 <- predict(model_1, X, base_margin = margin)

Compare the results:

.. tabs:: .. code-tab:: py

    np.testing.assert_allclose(predt_0, predt_1)

.. code-tab:: r R

    all.equal(predt_0, predt_1, tolerance = 1e-6)