1

I'm aware of the issues underlying using PCA to feature select (https://blog.kxy.ai/5-reasons-you-should-never-use-pca-for-feature-selection/) and (https://towardsdatascience.com/pca-is-not-feature-selection-3344fb764ae6). However, I need to consider potential alternative methods of feature selection rather than just looking at correlation, so for now I'm looking at using PCA to see if it might be useful.

My original dataset consists of thousands of proteins ( rows= samples, columns= the different proteins), where the values correspond to their concentration values. Values were centered and log transformed prior to PCA.

Using PCA as an unsupervised way to "feature select", I have selected the first 30 PCs that account for ~80% of the explained variation. For each feature, I have multiplied the loadings by the 'proportion of variance explained'. Then I took the sum of all loadings for each feature, I then use a cut off point and select only those features which have a final loading value >= to that cut off ( supposedly the most important in relation to the PCs).

My confusion lies with what exactly does a feature with a high loading mean in relation to the feature in the original dataset? For example, from this link (https://towardsdatascience.com/pca-is-not-feature-selection-3344fb764ae6), he explains "The only way PCA is a valid method of feature selection is if the most important variables are the ones that happen to have the most variation in them". In other words does this mean features with high loadings would show the most variation in the original dataset?

Ultimately, I would like to select features which do not show a large variation in the original dataset across samples, so should I select those with a lower loading?

If I'm on the wrong tracks entirely, considering the definition of loadings as "the covariances/correlations between the original features and the unit-scaled components", if features with low loadings are not informative/not important for the PCA, what is that telling me about the variance or otherwise of the feature in the original dataset?

I'm aware of similar questions on Stack/Cross-Validated, but none that clarify this point.

Using PCA for feature selection?

Using principal component analysis (PCA) for feature selection

Any nudges in the right direction would be appreciated.

aim6789
  • 55
  • I have multiplied the loadings by the proportion of variance explained. Why would you need doing this? A loading squared already bears the information on the magnitude of variance explained. Variance explained by the component it the sum of its squared loadings. – ttnphns May 15 '23 at 06:41
  • @ttnphns that's a good point - I could only look at sum of (absolute) loadings for each variable across the PCs explaining 80% of the variance ( similar to this paper https://www.sciencedirect.com/science/article/pii/S2666827021000852). I've also tried extracting the features below or equal to my cut off from each PC and then getting a unique list from this. My uncertainty around what it means if a feature is deemed as having a low loading after this - that they contribute little to the generation of the PCs or that they have a smaller variance in the original dataset or something else etc. – aim6789 May 15 '23 at 12:58
  • If the PCA was done on the correlations, low loading (in condition when the number of PCs extracted is not big) means the variable "splits off the flock", it is orthogonal to most of other variables and thus is not represented enough by any of the PCs. But if the analysis was done on covariances, low loadind may by due to the above reason as well due to its low variance. – ttnphns May 15 '23 at 16:14
  • Check this https://stats.stackexchange.com/q/53/3277 – ttnphns May 15 '23 at 16:15
  • @ttnphns thank you very much for your help! I'm looking at different combinations of centering and/or scaling, but in the instance where I'm considering only data that has been centered and log transformed for now ( not scaled per se, so covariance matrix I believe, using prcomp in R and scale=FALSE), a high loading could be due to a higher variance, but why might that be better in terms of feature selection..would this be because it could be useful to discriminate between say control vs disease in downstream analyses, if there might be greater variance? – aim6789 May 15 '23 at 21:06
  • along the same lines, features with a low loading/low variance might perhaps be less influential in class separation - but this would only be applicable if using PCA done on a covariance matrix? – aim6789 May 15 '23 at 21:49

0 Answers0