Using QDA for Non-Gaussian distributions

Question

I am evaluating a Quadratic Discriminant Analysis (QDA) classifier on a high-dimensionality feature set. The features come from highly non-Gaussian distributions. However, when I transform the features to each have a Gaussian distribution in two different ways, the resulting classifiers performs worse than QDA applied on the raw features with the following three metrics:

Accuracy (classes are balanced)
Area under the ROC
A probabilistic metric

The first way of transforming each feature to a Gaussian distribution disregarded the class. For each feature, it found the parameters of the the corresponding distribution, used the distribution's CDF function or approximation for each data point, and then used the Gaussian's inverse CDF.

The second way did the same thing, but included class labels and treated data from each class independently.

Any idea why this occurs?

I have confirmed that it is not due to ...

A bug in the code
Distributions changing from train set to test set

Is the performance measured with the training dataset or with a held-out dataset? — whuber, Mar 13 '11 at 20:50
If you only do marginal calibration, It might be that before tranforming the data quadratic separation holds and after transforming the data it does not hold anymore. — robin girard, Mar 13 '11 at 21:45
I did not get how you have dealt with the covariance of the features. — Wok, Mar 13 '11 at 22:41
Please show us a Q-Q plot, as well as mean, variance, skew and kurtosis of the transformed data for both labels. — Wok, Mar 13 '11 at 22:53
Moreover, you might not get uniform distribution before taking the inverse multivariate normal cdf. See this post: there could be jumps. — Wok, Mar 13 '11 at 22:57

Using QDA for Non-Gaussian distributions

0 Answers0