I am evaluating a Quadratic Discriminant Analysis (QDA) classifier on a high-dimensionality feature set. The features come from highly non-Gaussian distributions. However, when I transform the features to each have a Gaussian distribution in two different ways, the resulting classifiers performs worse than QDA applied on the raw features with the following three metrics:
- Accuracy (classes are balanced)
- Area under the ROC
- A probabilistic metric
The first way of transforming each feature to a Gaussian distribution disregarded the class. For each feature, it found the parameters of the the corresponding distribution, used the distribution's CDF function or approximation for each data point, and then used the Gaussian's inverse CDF.
The second way did the same thing, but included class labels and treated data from each class independently.
Any idea why this occurs?
I have confirmed that it is not due to ...
- A bug in the code
- Distributions changing from train set to test set