0

I performed a PCA on the results of 4 different tests. Each test can only have whole integers as a score, and is measured on a scale of 0-5.

From what I've read, you are not supposed to conduct a PCA on categorical data, which is similar in structure to my data. However, I've also read that this assumption or rule is often broken/relaxed, as interpretation might be valuable nonetheless.

Out of curiosity I decided to simply see what would happen, and I find that my PCA generates exactly four components. Moreover, each of my test score variables load perfectly on one of these 4 components.

Loadings:
      PC1 PC2 PC3 PC4
Test1              1  
Test2  1              
Test3          1      
Test4      1          

Q1: I was wondering what is happening here. Could someone (in simple terms) explain what this means and how this could be the case?

Q2: I have a list of 8 more tests which are all measured on a continuous scale. I have also performed a PCA on this data. However, I was hoping to also add these 4 tests to that PCA. I wish to do this as the results of the 4 tests mentioned above are better understood than the results of the remaining 8 tests. Would it be acceptable to include all 12 tests (8 continuous + 4 from above)?

R. Iersel
  • 65
  • 9

1 Answers1

1

After continuing to look I finally found the answer for Q1 in: Why does my loading matrix following PCA with a varimax rotation contain only ones and zeros?

The answer is provided by Leonardo Fontenelle. In short: I took all my components with me into the varimax rotation. Instead what I should've done, is after considering eigenvalues, screeplot and % of explained variance, decide which components I wanted to keep (from what I found you generally want eigenvalue >1, % of explained variance of 70-80% (cumulative for all components), and in the scree plot you want as many components as it takes for the variation to "level off". For a more in depth explanation see: https://www.theanalysisfactor.com/factor-analysis-how-many-factors/

As for Q2: Sadly there this is less clear-cut of an answer; but it depends. In my case, I would like to couple the 4 categorical tests to my other test scores as these 4 are better understood, thus aiding in the interpretation of the component loadings. That said, it is statistically less sound and there are alternative analysis for purely categorical data.

R. Iersel
  • 65
  • 9