From my understanding, GBDart drops trees in order to solve over-fitting. However, when tuning, using xgboost package, rate_drop, by default is 0. I understand this is a parameter to tune, however, what if the optimal model suggested rate_drop = 0? Are we effectively using GBTree then?
Asked
Active
Viewed 1.1k times
1 Answers
3
Yes, if rate_drop=0, we effectively have zero drop-outs so are using a "standard" gradient booster machine.
It is important to be aware that when predicting using a DART booster we should stop the drop-out procedure. When training, the DART booster expects to perform drop-outs. Most DART booster implementations have a way to control this; XGBoost's predict() has an argument named training specific for that reason. If we use a DART booster during train we want to get different results every time we re-run it.
Here is some code showcasing what was described above. First the training:
import xgboost as xgb # Version '1.0.2'
import numpy as np
from sklearn import datasets
# import some data to play with
iris = datasets.load_iris()
N = 90
X = iris.data[:N, :4] # we only take the first four features.
y = iris.target[:N]
dtrain = xgb.DMatrix(X, label=y)
Xt = iris.data[N:, :4] # we only take the first four features.
dtest = xgb.DMatrix(Xt)
param_real_dart = {'booster': 'dart',
'objective': 'binary:logistic',
'rate_drop': 0.10,
'skip_drop': 0.5}
param_gbtr = {'booster': 'gbtree',
'objective': 'binary:logistic'}
param_fake_dart = {'booster': 'dart',
'objective': 'binary:logistic',
'rate_drop': 0.00,
'skip_drop': 0.5}
num_round = 50
bst_gbtr = xgb.train(param_gbtr, dtrain, num_boost_round=num_round)
bst_real_dart = xgb.train(param_real_dart, dtrain, num_boost_round=num_round)
bst_fake_dart = xgb.train(param_fake_dart, dtrain, num_boost_round=num_round)
OK, and how are we on testing:
np.array_equal(bst_gbtr.predict(dtest), bst_fake_dart.predict(dtest))
# True (for `rate_drop`: 0 we get same results as with a regular booster)
np.array_equal(bst_gbtr.predict(dtest), bst_dart.predict(dtest))
# False (for `rate_drop` not 0, we get different results from a regular booster)
np.array_equal(bst_dart.predict(dtest, training=True),
bst_dart.predict(dtest, training=True))
# False (a DART booster returns different results when training)
np.array_equal(bst_dart.predict(dtest, training=False),
bst_dart.predict(dtest, training=False))
# True (a DART booster (should) return consistent results when testing)
usεr11852
- 44,125
-
This may be impossible to answer, but can you explain how it is possible to get two different answers when I run my model using gbtree vs dart using the exact same data and parameters? – Jack Armstrong Apr 28 '20 at 16:40
-
I suspect that the reason is the DART booster is (potentially by accident) on training mode. Please see my amended my answer where I have some additional information on this matter (and a quick example). – usεr11852 Apr 28 '20 at 19:24
-
That makes sense. I am using the XGBRegressor() but through a GridSearchCV(). So I guess I would have to get the best model from the GridSearch then re-create the xg model as xgb.train() then xgb.train().preict(dtest, training = False). – Jack Armstrong Apr 28 '20 at 19:32
-
Yes. You are not alone.. I have come across similarly issues myself when using DART boosters. :) – usεr11852 Apr 28 '20 at 19:35