When I test the GBM boosting model on the Caravan data set and predict whether there will be a purchase I get all positive values. I thought I was supposed to log-transform the data to get whether a prediction is greater than 50% and in that case it will be "Yes" otherwise "No". I am not sure how to interpret this: which values correspond to "Yes" and to "No" respectively?
library(ISLR2)
data(Caravan)
Caravan.train = Caravan[1:1000,]
Caravan.test = Caravan[-c(1:1000),]
caravan.boost = gbm(Purchase~., data = Caravan.train,
distribution = "bernoulli", n.trees = 1000,
interaction.depth = 4, shrinkage = 0.01)
caravan.inf = summary(caravan.boost)
caravan.inf2 = caravan.inf[which(!(caravan.inf$rel.inf==0)),]
caravan.sort = caravan.inf2[order(-caravan.inf2$rel.inf),]
# The most important variables are:
# PPERSAUT, PBRAND, MKOOPKLA, MGODGE, MOPLHOOG, MOSTYPE, MINK3045 and
# MBERMIDD.
caravan.pred = predict(caravan.boost,newdata = Caravan.test)
> caravan.pred
[1] 1.0371814 1.0948238 1.0139326 1.1473339 1.2249934 1.2160427 1.0242501
[8] 1.1058666 1.1312732 0.9999499 1.0495137 0.9754117 1.0317667 1.0338068
[15] 1.0238197 1.0538183 1.1044108 1.1857257 0.9775377 1.0494232 0.9663663 [remaining data removed for clarity]
I try to log transform them and I get lots of NaNs:
> log(caravan.pred/(1-caravan.pred))
[1] NaN NaN NaN NaN NaN NaN NaN
[8] NaN NaN 9.902012 NaN 3.680591 NaN NaN
[15] NaN NaN NaN NaN 3.773200 NaN 3.358013 [remaining data removed for clarity]
Warning message:
In log(caravan.pred/(1 - caravan.pred)) : NaNs produced
Purchasevariable. – whuber Feb 18 '23 at 15:42