I have computed some AUC values from the ROC curve based on logistic regressions. Firstly, I have divided my two datasets (D1, D2) into three different drivers, let us call them L La, CC, and kept them all in one set too LLaCC.
- The data is split into 80:20 train:test, respectively (N > 1,000,000 data points).
- The logistic regression is performed on the train dataset.
- The model is evaluated via the
area under the curvemethod on the test dataset.
Therefore, we have AUC values for the two datasets (D1, D2) and the four drivers (L La, CC, and LLaCC).
L La CC LLaCC
D1 .5 .6 .89 .93
D1 .5 .75 .81 .86
I have been asked if these differences are significant, I assume within and between groups. But, I do not know whether or not this is even possible? I mean is this not too few estimates to even compare them statistically? NB. No this is not a school assignment.