1

I would like to use Bias-Variance trade-off to evaluate training set size in a classification problem. There are two classes which are not balanced (~70/30) and it seems that the common use of misclassification error is not good enough. Which performance measures should I use in this case?

Andy
  • 19,098
Eitan
  • 131

2 Answers2

1

You could use precision or recall measures, or F1 which is a combination of the two.

Precision is the ratio of true positives, divided by the number of predicted positives (= the sum of the true positives and the false positives).

Recall is the ratio of true positives, divided by the number of actual positives (= the sum of the true positives and the false negatives).

The values for precision and recall you want depend on your problem. For example, if you only want to predict y = 1 when you are very confident, use a higher precision (and lower recall).

If you want a single number evaluation, the F1 score is calculated as follows: 2 * ((P*R)/(P+R)) with P being precision and R being recall.

Mien
  • 729
  • Thanks! i was not familiar with F1 measure. Is there any reason for using F1 instead of Kappa statistic? Can it be generalized to multiple classes as well? – Eitan Apr 05 '15 at 08:19
  • I'm not that familiar with the Kappa statistic so I won't comment on it. The measures I mentioned are usually used in the context of skewed classes (so if one class appears a lot more than the other, as in your example). I guess you could apply this to multiple classes, if you divide your classes over several comparisons, in a one-vs-all classification way. However, having multiple classes, in which one (or more) is skewed, is a little odd in my eyes. If you have multiple classes and 1 class is skewed, something is wrong with your "division" of classes. – Mien Apr 08 '15 at 06:31
0

I'm a bit confused about why you're mentioning Bias-Variance tradeoff, but F1 score is indeed a good simple metric to avoid the unbalanced problem. Suppose you label your classes as positive and negative and their respective distribution is 30 / 70. If your classifier always predicts the negative class, then here are the true positive, true negative, false negative, true negative values:

TP = 0

TN = 70

FN = 30

FP = 0

Thus, the classifier's accuracy will be:

ACC = (TP + TN) / (TP + TN + FP + FN) = 70%

But the F1 score will be F1 = 2*TP / (2*TP + FP + FN) = 0, which clearly tells that this "dumb" classifier cannot predict the the positive data.

  • Thanks! Way you'r confused? Bias-Variance needs some performance measure to best represent the train and test performances while increasing model complexity. For unbalanced classes one needs a better measure than misclassification error. Correct me if i'm wrong.. – Eitan Apr 05 '15 at 08:25