Difference between GBTree and GBDart

Question

From my understanding, GBDart drops trees in order to solve over-fitting. However, when tuning, using xgboost package, rate_drop, by default is 0. I understand this is a parameter to tune, however, what if the optimal model suggested rate_drop = 0? Are we effectively using GBTree then?

usεr11852 · Accepted Answer · 2020-04-28T19:22:13.660

Yes, if rate_drop=0, we effectively have zero drop-outs so are using a "standard" gradient booster machine.

It is important to be aware that when predicting using a DART booster we should stop the drop-out procedure. When training, the DART booster expects to perform drop-outs. Most DART booster implementations have a way to control this; XGBoost's predict() has an argument named training specific for that reason. If we use a DART booster during train we want to get different results every time we re-run it.

Here is some code showcasing what was described above. First the training:

import xgboost as xgb # Version '1.0.2'
import numpy as np
from sklearn import datasets 

# import some data to play with
iris = datasets.load_iris()
N = 90
X = iris.data[:N, :4]  # we only take the first four features.
y = iris.target[:N]
dtrain = xgb.DMatrix(X, label=y)

Xt = iris.data[N:, :4]  # we only take the first four features. 
dtest = xgb.DMatrix(Xt)

param_real_dart = {'booster': 'dart', 
              'objective': 'binary:logistic', 
              'rate_drop': 0.10,
              'skip_drop': 0.5}

param_gbtr = {'booster': 'gbtree', 
              'objective': 'binary:logistic'}

param_fake_dart = {'booster': 'dart', 
                   'objective': 'binary:logistic', 
                   'rate_drop': 0.00,
                   'skip_drop': 0.5}

num_round = 50
bst_gbtr = xgb.train(param_gbtr, dtrain, num_boost_round=num_round)
bst_real_dart = xgb.train(param_real_dart, dtrain, num_boost_round=num_round)
bst_fake_dart = xgb.train(param_fake_dart, dtrain, num_boost_round=num_round)

OK, and how are we on testing:

np.array_equal(bst_gbtr.predict(dtest), bst_fake_dart.predict(dtest))
# True (for `rate_drop`: 0 we get same results as with a regular booster)
np.array_equal(bst_gbtr.predict(dtest), bst_dart.predict(dtest))
# False (for `rate_drop` not 0, we get different results from a regular booster)
np.array_equal(bst_dart.predict(dtest, training=True), 
               bst_dart.predict(dtest, training=True))
# False (a DART booster returns different results when training)
np.array_equal(bst_dart.predict(dtest, training=False), 
               bst_dart.predict(dtest, training=False))
# True (a DART booster (should) return consistent results when testing)

This may be impossible to answer, but can you explain how it is possible to get two different answers when I run my model using gbtree vs dart using the exact same data and parameters? — Jack Armstrong, Apr 28 '20 at 16:40
I suspect that the reason is the DART booster is (potentially by accident) on training mode. Please see my amended my answer where I have some additional information on this matter (and a quick example). — usεr11852, Apr 28 '20 at 19:24
That makes sense. I am using the XGBRegressor() but through a GridSearchCV(). So I guess I would have to get the best model from the GridSearch then re-create the xg model as xgb.train() then xgb.train().preict(dtest, training = False). — Jack Armstrong, Apr 28 '20 at 19:32
Yes. You are not alone.. I have come across similarly issues myself when using DART boosters. :) — usεr11852, Apr 28 '20 at 19:35

Difference between GBTree and GBDart

1 Answers1