I'm presently evaluating the position of individuals of an 3 populations of an animal (according to their sexe) in function of the environmental factors (12) present in their habitat. To detect which environmental factors have the most impact, I'm using PCA in R.
I have standardized, centered my data and chose the PCs that have an eigenvalues > 1. I obtained 4 «significant» PCs.
My next step was to define my PCs by determinating the number of factors. To determinate that, I evaluated the contribution (%) of each factors on a Scree Plot. One Scree Plot for each PCs - so I have 4 of them. The factors retained have a contribution > (1/12)% - (1 / number of factors in total).
When evaluating these PCs, I notice that I have the same variables retained for more than one PCs - for example : Temperature being retained in PC 3 and PC 2.
As PCA is suppose to put together features/variables and create new features that are uncorrelated, I am wondering if the fact that Temperature is repeated twice causes a problem?
From reading posts on CV and other Website, I found different possbilities:
- Leave it as it is and mention that more than one variable is repeating
- Do Factor analysis or clustering
- Consider the highest percentage of contribution
But I'm not sure what to do.
Any guidance would be helpful
Thank you!