doc/tutorials/dart.rst
DART
XGBoost mostly combines a huge number of regression trees with a small learning rate. In this situation, trees added early are significant and trees added late are unimportant.
Vinayak and Gilad-Bachrach proposed a new method to add dropout techniques from the deep neural net community to boosted trees, and reported better results in some situations.
This is a instruction of the dropout mode for tree models. Dropout is controlled by
parameters like rate_drop. The legacy dart booster name remains available for
compatibility.
Original paper
Rashmi Korlakai Vinayak, Ran Gilad-Bachrach. "DART: Dropouts meet Multiple Additive Regression Trees." [PMLR <http://proceedings.mlr.press/v38/korlakaivinayak15.pdf>, arXiv <https://arxiv.org/abs/1505.01866>].
Features
Drop trees in order to solve the over-fitting.
Because of the randomness introduced in the training, expect the following few differences:
gbtree because the random dropout prevents usage of the prediction buffer.How it works
m-th training round, suppose :math:k trees are selected to be dropped.D = \sum_{i \in \mathbf{K}} F_i be the leaf scores of dropped trees and :math:F_m = \eta \tilde{F}_m be the leaf scores of a new tree... math::
\mathrm{Obj} = \sum_{j=1}^n L \left( y_j, \hat{y}_j^{m-1} - D_j + \tilde{F}_m \right)
D and :math:F_m are overshooting, so using scale factor.. math::
\hat{y}j^m = \sum{i \not\in \mathbf{K}} F_i + a \left( \sum_{i \in \mathbf{K}} F_i + b F_m \right) .
Parameters
Dropout uses the same tree parameters as gbtree, such as eta, gamma,
max_depth, and others.
Additional parameters are noted below:
sample_type: type of sampling algorithm.
uniform: (default) dropped trees are selected uniformly.weighted: dropped trees are selected in proportion to weight.normalize_type: type of normalization algorithm.
tree: (default) New trees have the same weight of each of dropped trees... math::
a \left( \sum_{i \in \mathbf{K}} F_i + \frac{1}{k} F_m \right) &= a \left( \sum_{i \in \mathbf{K}} F_i + \frac{\eta}{k} \tilde{F}_m \right) \ &\sim a \left( 1 + \frac{\eta}{k} \right) D \ &= a \frac{k + \eta}{k} D = D , \ &\quad a = \frac{k}{k + \eta}
forest: New trees have the same weight of sum of dropped trees (forest)... math::
a \left( \sum_{i \in \mathbf{K}} F_i + F_m \right) &= a \left( \sum_{i \in \mathbf{K}} F_i + \eta \tilde{F}_m \right) \ &\sim a \left( 1 + \eta \right) D \ &= a (1 + \eta) D = D , \ &\quad a = \frac{1}{1 + \eta} .
rate_drop: dropout rate.
skip_drop: probability of skipping dropout.
Sample Script
.. code-block:: python
import xgboost as xgb
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train?format=libsvm') dtest = xgb.DMatrix('demo/data/agaricus.txt.test?format=libsvm')
param = {'max_depth': 5, 'learning_rate': 0.1, 'objective': 'binary:logistic', 'sample_type': 'uniform', 'normalize_type': 'tree', 'rate_drop': 0.1, 'skip_drop': 0.5} num_round = 50 bst = xgb.train(param, dtrain, num_round) preds = bst.predict(dtest)