0

I wonder if I can presume that if higher sum(pca.explained_variance_ratio), better the separation of groups?

I wish to randomly check PCA on 100 samples and I wish to plot only the one with best separation. Is checking the highest value from explained variance ration the way to go?

For example

1.st PCA(3) has a sum variance ratio 0.7

2.nd PCA(3) has a sum variance ratio 0.9

Can I assume 2nd one will give me a better plot?

Thanks!

Pitouille
  • 1,482

1 Answers1

3

PCA does not optimise the separation between the groups, and the variances of the principal components are not normally informative about group separation.

  • To expand a bit on the second point, you could have a PC1 that explains 10% of the variation yet completely explains the separation between the groups in the data. Conversely, you could have a PC1 that explains 90% of the variation in the data, yet the groups may not be linearly separable in principal component space at all. – alan ocallaghan Oct 11 '21 at 12:50
  • Thank you for your answers! Do you have any idea how to see from scores what separation is the best? – Noob Programmer Oct 11 '21 at 13:50
  • 1
    PCA is not made for this, maybe discriminant coordinates (discriminant functions) may help you, see https://en.wikipedia.org/wiki/Linear_discriminant_analysis#Discriminant_functions – Christian Hennig Oct 11 '21 at 19:36