I am a geologist attempting to apply the discriminant function analysis to surface features I have mapped in ArcGIS. At the moment I have 4 dimensionless sorting variables calculated for each feature, and about 85% of my data points have been classified into one of 4 group types. I didn't check whether the data were normalized before I ran the first analysis. The DFA was able to successfully cross-validate 70.7% of all the data points, and one group was successfully cross-validated 91.8% of the time (very good). The problem is that the Box's M statistic is 0.000 at the p=0.05 level (default in SPSS), and because that technically means the results are not robust, it's probably a useless result.
I went back and checked the data for normality in MatLab. I was able to make more normal the collective data using Box-Cox transform, and then applying the kstest function on the transformed data in MatLab; however, the distribution of data within the groups (as opposed to the collective data from all groups, which is what I tested first) is non-normal, and I cannot justify applying different transformations to different groups.
My questions are the following:
- Given how well it was able to cross-validate despite the Box's M statistic problem, are the results okay?
- Is there anything I can do to salvage the analysis if the results are not okay?
I think I know the answer to both of the questions, but I want to make sure I can't do anything before I give up entirely and descope to a linear or log regression analysis. Thanks in advance for the answers!
Here is a picture of the SPSS DFA graph, if it helps:
