I use the machine learning software WEKA for data mining on biological data. I would describe my dataset as unbalanced: It comprises around 2000 instances, splitting in classes of 900, 500, 350, 160 that are very important to have in the dataset and some less important smaller classes that are nice to have but can be removed from the dataset if they confuse the learning to much.
Currently I am comparing many different classifiers. I am not a very experienced statistician, but I read that ROC curves are commonly used to evaluate the performance of machine learning classifiers. However, I also read that ROC has drawbacks when it comes to unbalanced datasets.
Is there a better measure among the ones the WEKA output features (or can be calculated from them) for my dataset? Thats how the output looks like (here with the iris dataset):
=== Stratified cross-validation ===
Correctly Classified Instances 144 96 %
Incorrectly Classified Instances 6 4 %
Kappa statistic 0.94
Mean absolute error 0.035
Root mean squared error 0.1586
Relative absolute error 7.8705 %
Root relative squared error 33.6353 %
Total Number of Instances 150
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.98 0 1 0.98 0.99 0.99 Iris-setosa
0.94 0.03 0.94 0.94 0.94 0.952 Iris-versicolor
0.96 0.03 0.941 0.96 0.95 0.961 Iris-virginica
Weighted Avg. 0.96 0.02 0.96 0.96 0.96 0.968
=== Confusion Matrix ===
a b c <-- classified as
49 1 0 | a = Iris-setosa
0 47 3 | b = Iris-versicolor
0 2 48 | c = Iris-virginica