1

I'm presently evaluating the position of individuals of an 3 populations of an animal (according to their sexe) in function of the environmental factors (12) present in their habitat. To detect which environmental factors have the most impact, I'm using PCA in R.

I have standardized, centered my data and chose the PCs that have an eigenvalues > 1. I obtained 4 «significant» PCs.

My next step was to define my PCs by determinating the number of factors. To determinate that, I evaluated the contribution (%) of each factors on a Scree Plot. One Scree Plot for each PCs - so I have 4 of them. The factors retained have a contribution > (1/12)% - (1 / number of factors in total).

When evaluating these PCs, I notice that I have the same variables retained for more than one PCs - for example : Temperature being retained in PC 3 and PC 2.

As PCA is suppose to put together features/variables and create new features that are uncorrelated, I am wondering if the fact that Temperature is repeated twice causes a problem?

From reading posts on CV and other Website, I found different possbilities:

  1. Leave it as it is and mention that more than one variable is repeating
  2. Do Factor analysis or clustering
  3. Consider the highest percentage of contribution

But I'm not sure what to do.

Any guidance would be helpful

Thank you!

ttnphns
  • 57,480
  • 49
  • 284
  • 501
  • Welcome to CV! Could you please explain what the aims or methods of "evaluating environmental factors" are, what it means for a variable to be "repeating more than once," and what a "percentage of contribution" is? Your question sounds a bit like https://stats.stackexchange.com/questions/50537, but it's unclear whether or how your circumstances differ. – whuber Mar 18 '23 at 20:16
  • @whuber Thank you! Yes I will try my best to clarify : I'm evaluating the position of animals (male : female) regarding the environmental factors of their habitat. I have 12 factors. I obtain 4 PC that needs to be evaluated (bc their eigenvalue is higher than 1). To determine what were the factors that define each PC, I did a Scree Plot. To detect those factors, their contribution (Y-axis ; the variance of the data explained by the factor) must be higher than (1/12%). In my PC, some variables are defining more than one PCs (e.g. temperature is one of my highest variables in PC 2 and PC 3). – user383536 Mar 18 '23 at 22:58
  • Please include that information in your post. There's a lot going on here: some standard but ill-advised advice on PCA is recognizable in your description and it's still not clear what specifically you aim to get out of this analysis. So please consider what part of the analysis you want to focus on in your question. – whuber Mar 19 '23 at 13:22
  • 1
    @whuber will do. Thank you again for your help. – user383536 Mar 19 '23 at 14:10

0 Answers0