2

My problem is that i have created four candidate models that I am comparing mainly via the following performance measures: F-measure, recall, precision, accuracy and visual ROC assessment.

The problem is that as you see from the table, the SVM_linear_test performs the best given F-measure. This model corresponds to the blue line in the ROC chart. The red line in the ROC charts corresponds to the SVM_RBF_test as given the ROC chart this model performs the best. Having read a lot recently about performance measures on binomial classifiers I have not come across what is obvious in my example here. ROC does not take into account false positives and therefore in my case, a ROC assessment is not worth much. One can of course always debate if we would want to assign more weight to the positive class but in this case we leave this discussion out.

Numerical performance measures

enter image description here

1 Answers1

2

We know from statistical theory that in the absence of prior information, the log likelihood is an optimum criteria for estimating unknown parameters such as regression coefficients. Because of that it is very advantageous to use the log likelihood in judging model performance. This leads to optimum power and precision. So think about using a log-likelihood-based generalized $R^2$ measure. This solves the problem that ROC area is too insensitive to detect real differences in predictive discrimination. In addition, the ROC curves you have drawn, while often used in standard practice, have an enormous ink:information ratio. ROC curves are usually drawn because of their frequency of use in the literature, but I have not seen an example where looking at the curves leads to insights.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397