1

I have a dataset with tens of thousands of samples and only 7 features. I want to use linear discriminant analysis to classify these samples into 2 classes. The Python scikit-learn's LDA package is used. However, the results become very strange, as I found that the classification is dominated by a single feature, as shown in the attached figure. LDA coefficients are:

9259.89517281, -191.60635381,   403.78787046,   -79.33404653,  -93.1753486 ,   110.23429443,   -41.93012569

As all values in all features are >=0 and scaled between 0 and 10, the coefficients indicate the classification is dominated by the first feature. I think all features should be important for the classification. Is there a way to solve this? I have tried Shrinkage provided by the package I used, but seems not work.

Elkan
  • 185
  • Imagine that you have two classes (clouds) in 2D space, and the clouds are separated only along X axis and not along Y axis. Then, obviously, feature X is the only "important" discriminator of the classes. – ttnphns May 12 '18 at 11:16
  • Yor question has nothing to do with the amount of data points. – ttnphns May 12 '18 at 11:18

0 Answers0