When is classification error rate preferable when pruning decision trees?

Question

I'm going through Chapter 8 of "Introduction to Statistical learning" which introduces decision trees. My question is specific to the three approaches to pruning a decision tree (i.e., classification error rate, Gini Index, and cross-entropy).

With regard to building classification trees, the chapter states that "classification error is not sufficiently sensitive enough for tree-growing, and in practice, the Gini Index and cross-entropy are preferred".

However, it also states that "Any of these three approaches might be used when pruning the tree, but the classification error rate is preferable if prediction accuracy of the final pruned tree is the goal."

There are two questions with regard to this:

Given that classification error rate is not sensitive enough, why should it be used, over Gini Index and cross-entropy, if prediction accuracy is the goal? What advantage does it have over Gini Index and cross-entropy?
If classification error rate is preferred, in what instances would we use the Gini Index and cross-entropy when pruning a decision tree?

score 2 · Accepted Answer · answered Mar 23 '15 at 19:47

It's generally the case that, if you're trying to maximize some loss function (classification accuracy, Brier score, log-loss, etc.) it's more effective to use modeling procedures (tree learning, tree pruning) that maximize this directly. So the default attitude would be that, if you're trying to maximize classification accuracy, you should both train and prune your tree based on classification accuracy.

However, there are a couple of things that might motivate you to make exceptions to this and not train your tree based on classification accuracy:
1. The tree learning algorithm is greedy, so trying to maximize classification accuracy at each step may not end up selecting the accuracy-maximizing classifier overall. This is exacerbated because classification accuracy is insensitive/noisy: if you try too hard to optimize classification accuracy, you will end up fitting on noise and overfitting. By contrast, doing accuracy-based pruning at the end is less prone to the fitting-on-noise issue because you're making fewer choices, so the consideration of maximizing your loss function directly is more important.
2. Classification accuracy is not a proper scoring rule, so trying too hard to maximize it can cause your classifier to return predictably bad probabilities.
For the same reason I described above, if you are trying to maximize the Brier score of the resulting tree, you might want to prune using Gini index (which is essentially Brier score). If you are trying to maximize log-loss of the resulting tree (which is essentially cross-entropy), you might want to prune using cross-entropy.

When is classification error rate preferable when pruning decision trees?

1 Answers1