1

This post is related to an early post here published a few days ago, whereby I was having issues with prediction error rates; that is, the classification tree that I grew underperforms a naive intercept-only model, in which there is no predictors and one simply bets on the majority class of the zero-one coding of the binary response variable.

Then I went ahead and did cross-validation of the model, and the results below show that a singleton model is the best and has the least amount of prediction error. Does this align with the problem in the post that I was referring to previously? And how to resolve it? Thanks a lot!

> set.seed(47306)
> cv.h2 <- cv.tree(tree.h2, FUN=prune.misclass)
> cv.h2
$size
[1] 26  9  6  4  1

$dev [1] 270 270 270 270 270

$k [1] -Inf 0.00 1.00 2.50 2.67

$method [1] "misclass"

attr(,"class") [1] "prune" "tree.sequence" > min.error = which.min(cv.h2$dev) > min.error [1] 1 > table(usedta[class.train,]$h2)

1poorHlth 0goodHlth 270 1305

  • Here the real problem, I guess, is all the deviance are equal regardless of the number of terminal nodes or how I grow the tree. Any thoughts? – WaterWood Aug 27 '20 at 15:35

0 Answers0