15

I know how to perform PCA (principal component analysis), but I would like to know steps that should be used for factor analysis.

To perform PCA, let us consider some matrix $A$, for instance:

         3     1    -1
         2     4     0
         4    -2    -5
        11    22    20

I have calculated its correlation matrix B = corr(A):

        1.0000    0.9087    0.9250
        0.9087    1.0000    0.9970
        0.9250    0.9970    1.0000

Then I have done eigenvalue decomposition [V,D] = eig(B), resulting in eigenvectors:

        0.5662    0.8209   -0.0740
        0.5812   -0.4613   -0.6703
        0.5844   -0.3366    0.7383

and eigenvalues:

        2.8877         0         0
             0    0.1101         0
             0         0    0.0022

General idea behind the PCA is to choose significant components, form new matrix which have columns eigenvectors, then we need to project original matrix (in PCA it is zero centered). But in factor analysis, for instance we should choose components that have higher that $1$ singular value, also then we are using rotation of factors, please tell me how it is done? For instance in this case.

Please help me understand factor analysis steps, as compared to the PCA steps.

amoeba
  • 104,745
dato datuashvili
  • 775
  • 2
  • 11
  • 21

1 Answers1

30

This answer is to show concrete computational similarities and differences between PCA and Factor analysis. For general theoretical differences between them, see questions/answers 1, 2, 3, 4, 5.

Below I will do, step by step, Principal Component analysis (PCA) of iris data ("setosa" species only) and then will do Factor analysis of the same data. Factor analysis (FA) will be done by Iterative principal axis (PAF) method which is based on PCA approach and thus makes one able to compare PCA and FA step-by-step.

Iris data (setosa only):

  id  SLength   SWidth  PLength   PWidth species

1 5.1 3.5 1.4 .2 setosa 2 4.9 3.0 1.4 .2 setosa 3 4.7 3.2 1.3 .2 setosa 4 4.6 3.1 1.5 .2 setosa 5 5.0 3.6 1.4 .2 setosa 6 5.4 3.9 1.7 .4 setosa 7 4.6 3.4 1.4 .3 setosa 8 5.0 3.4 1.5 .2 setosa 9 4.4 2.9 1.4 .2 setosa 10 4.9 3.1 1.5 .1 setosa 11 5.4 3.7 1.5 .2 setosa 12 4.8 3.4 1.6 .2 setosa 13 4.8 3.0 1.4 .1 setosa 14 4.3 3.0 1.1 .1 setosa 15 5.8 4.0 1.2 .2 setosa 16 5.7 4.4 1.5 .4 setosa 17 5.4 3.9 1.3 .4 setosa 18 5.1 3.5 1.4 .3 setosa 19 5.7 3.8 1.7 .3 setosa 20 5.1 3.8 1.5 .3 setosa 21 5.4 3.4 1.7 .2 setosa 22 5.1 3.7 1.5 .4 setosa 23 4.6 3.6 1.0 .2 setosa 24 5.1 3.3 1.7 .5 setosa 25 4.8 3.4 1.9 .2 setosa 26 5.0 3.0 1.6 .2 setosa 27 5.0 3.4 1.6 .4 setosa 28 5.2 3.5 1.5 .2 setosa 29 5.2 3.4 1.4 .2 setosa 30 4.7 3.2 1.6 .2 setosa 31 4.8 3.1 1.6 .2 setosa 32 5.4 3.4 1.5 .4 setosa 33 5.2 4.1 1.5 .1 setosa 34 5.5 4.2 1.4 .2 setosa 35 4.9 3.1 1.5 .2 setosa 36 5.0 3.2 1.2 .2 setosa 37 5.5 3.5 1.3 .2 setosa 38 4.9 3.6 1.4 .1 setosa 39 4.4 3.0 1.3 .2 setosa 40 5.1 3.4 1.5 .2 setosa 41 5.0 3.5 1.3 .3 setosa 42 4.5 2.3 1.3 .3 setosa 43 4.4 3.2 1.3 .2 setosa 44 5.0 3.5 1.6 .6 setosa 45 5.1 3.8 1.9 .4 setosa 46 4.8 3.0 1.4 .3 setosa 47 5.1 3.8 1.6 .2 setosa 48 4.6 3.2 1.4 .2 setosa 49 5.3 3.7 1.5 .2 setosa 50 5.0 3.3 1.4 .2 setosa

We have 4 numeric variables to include in our analyses: SLength SWidth PLength PWidth, and the analyses will be based on covariances, which is the same as to say that we analyse centered variables. (If we chose to analyse correlations that would be analysing standardized variables. Analysis based on correlations produce different results than analysis based on covariances.) I will not display the centered data. Let's call these data matrix X.

PCA steps:

Step 0. Compute centered variables X and covariance matrix S.

Covariances S (= X'*X/(n-1) matrix: see https://stats.stackexchange.com/a/22520/3277) .12424898 .09921633 .01635510 .01033061 .09921633 .14368980 .01169796 .00929796 .01635510 .01169796 .03015918 .00606939 .01033061 .00929796 .00606939 .01110612

Step 1.1. Decompose data X or matrix S to get eigenvalues and right eigenvectors. You may use svd or eigen decomposition (see https://stats.stackexchange.com/q/79043/3277)

Eigenvalues L (component variances) and the proportion of overall variance explained L Prop PC1 .2364556901 .7647237023 PC2 .0369187324 .1193992401 PC3 .0267963986 .0866624997 PC4 .0090332606 .0292145579

Eigenvectors V (cosines of rotation of variables into components) PC1 PC2 PC3 PC4 SLength .6690784044 .5978840102 -.4399627716 -.0360771206 SWidth .7341478283 -.6206734170 .2746074698 -.0195502716 PLength .0965438987 .4900555922 .8324494972 -.2399012853 PWidth .0635635941 .1309379098 .1950675055 .9699296890

Step 1.2. Decide on the number M of first PCs you want to retain. You may decide it now or later on - no difference, because in PCA values of components do not depend on M. Let's M=2. So, leave only 2 first eigenvalues and 2 first eigenvector columns.

Step 2. Compute loadings A. May skip if you don't need to interpret PCs anyhow. Loadings are eigenvectors normalized to respective eigenvalues: A value = V value * sqrt(L value) Loadings are the covariances between variables and components.

Loadings A PC1 PC2
SLength .32535081 .11487892 SWidth .35699193 -.11925773 PLength .04694612 .09416050 PWidth .03090888 .02515873

Sums of squares in columns of A are components' variances, the eigenvalues

Standardized (rescaled) loadings. St. loading is Loading / sqrt(Variable's variance); these loadings are computed if you analyse covariances, and are suitable for interpretation of PCs (if you analyse correlations, A are already standardized). PC1 PC2
SLength .92300804 .32590717 SWidth .94177127 -.31461076 PLength .27032731 .54219930 PWidth .29329327 .23873031

Step 3. Compute component scores (values of PCs).

Regression coefficients B to compute Standardized component scores are: B = Adiag(1/L) = inv(S)A B PC1 PC2
SLength 1.375948338 3.111670112 SWidth 1.509762499 -3.230276923 PLength .198540883 2.550480216 PWidth .130717448 .681462580

Standardized component scores (having variances 1) = X*B PC1 PC2 .219719506 -.129560000 -.810351411 .863244439 -.803442667 -.660192989 -1.052305574 -.138236265 .233100923 -.763754703 1.322114762 .413266845 -.606159168 -1.294221106 -.048997489 .137348703 ...

Raw component scores (having variances = eigenvalues) can of course be computed from standardized ones. In PCA, they are also computed directly as X*V PC1 PC2 .106842367 -.024893980 -.394047228 .165865927 -.390687734 -.126851118 -.511701577 -.026561059 .113349309 -.146749722 .642900908 .079406116 -.294755259 -.248674852 -.023825867 .026390520 ...

FA (iterative principal axis extraction method) steps:

Step 0.1. Compute centered variables X and covariance matrix S.

Step 0.2. Decide on the number of factors M to extract. (There exist several well-known methods in help to decide, let's omit mentioning them. Most of them require that you do PCA first.) Note that you have to select M before you proceed further because, unlike in PCA, in FA loadings and factor values depend on M. Let's M=2.

Step 0.3. Set initial communalities on the diagonal of S. Most often quantities called "images" are used as initial communalities (see https://stats.stackexchange.com/a/43224/3277). Images are diagonal elements of matrix S-D, where D is diagonal matrix with diagonal = 1 / diagonal of inv(S). (If S is correlation matrix, images are the squared multiple correlation coefficients.)

With covariance matrix, image is the squared multiple correlation multiplied by the variable variance. S with images as initial communalities on the diagonal .07146025 .09921633 .01635510 .01033061 .09921633 .07946595 .01169796 .00929796 .01635510 .01169796 .00437017 .00606939 .01033061 .00929796 .00606939 .00167624

Step 1. Decompose that modified S to get eigenvalues and right eigenvectors. Use eigen decomposition, not svd. (Some last eigenvalues may be negative. This is because a reduced covariance matrix can be not positive semidefinite.)

Eigenvalues L F1 .1782099114 F2 .0062074477 -.0030958623 -.0243488794

Eigenvectors V F1 F2 SLength .6875564132 .0145988554 .0466389510 .7244845480 SWidth .7122191394 .1808121121 -.0560070806 -.6759542030 PLength .1154657746 -.7640573143 .6203992617 -.1341224497 PWidth .0817173855 -.6191205651 -.7808922917 -.0148062006

Leave the first M=2 values in L and columns in V.

Step 2.1. Compute loadings A. Loadings are eigenvectors normalized to respective eigenvalues: A value = V value * sqrt(L value) F1 F2 SLength .2902513607 .0011502052 SWidth .3006627098 .0142457085 PLength .0487437795 -.0601980567 PWidth .0344969255 -.0487788732

Step 2.2. Compute row sums of squared loadings. These are updated communalities. Reset the diagonal of S to them

S with updated communalities on the diagonal .08424718 .09921633 .01635510 .01033061 .09921633 .09060101 .01169796 .00929796 .01635510 .01169796 .00599976 .00606939 .01033061 .00929796 .00606939 .00356942

REPEAT Steps 1-2 many times (iterations, say, 25)

Extraction of factors is done.

Let us look at the final eigenvalues of the reduced covariance matrix after iterations: Eigenvalues L F1 .2026316056 F2 .0137096989 .0005000572 -.0005882867

The eigenvalues are the factors' (F1 and F2) variances. The overall common variance is .2026316056 + .0137096989 = .2163413036, so F1, for example, explains .2026316056/.2163413036 = 93.7% of the common variance. That 93.7% of the common variance amounts to .2026316056/.3092040816 = 65.5% of the total variability (.3092040816, the total variance, is the trace of the initial, non-reduced covariance matrix). [Note. The .0005000572 + -.0005882867 do not count a common variance; these "dross" eigenvalues are nonzero due to the fact the 2-factor model does not predict the covariances without any error.]

Final loadings A and communalities (row sums of squares in A). Loadings are the covariances between variables and factors. Communality is the degree to what the factors load a variable, it is the "common variance" in the variable. F1 F2 Comm SLength .3125767362 .0128306509 .0978688416 SWidth .3187577564 -.0323523347 .1026531808 PLength .0476237419 .1034495601 .0129698323 PWidth .0324478281 .0423861795 .0028494498

Sums of squares in columns of A are the factors' variances: .2026316056 and .0137096989.

The main goal of factor analysis is to explain correlations or covariances by means of the loadings. A*t(A) is the restored covariances: .0978688416 .0992211576 .0162133990 .0106862785 .0992211576 .1026531808 .0118336023 .0089717050 .0162133990 .0118336023 .0129698323 .0059301186 .0106862785 .0089717050 .0059301186 .0028494498

See that off-diagonal elements above are quite close to those of the input covariance matrix: S .1242489796 .0992163265 .0163551020 .0103306122
.0992163265 .1436897959 .0116979592 .0092979592
.0163551020 .0116979592 .0301591837 .0060693878
.0103306122 .0092979592 .0060693878 .0111061224

Standardized (rescaled) loadings and communalities. St. loading is Loading / sqrt(Variable's variance); these loadings are computed if you analyse covariances, and are suitable for interpretation of Fs (if you analyse correlations, A are already standardized). F1 F2 Comm SLength .8867684574 .0364000747 .7876832626 SWidth .8409066701 -.0853478652 .7144082859 PLength .2742292179 .5956880078 .4300458666 PWidth .3078962532 .4022009053 .2565656710

Step 3. Compute factor scores (values of Fs). Unlike component scores in PCA, factor scores are not exact, they are reasonable approximations. Several methods of computation exist (https://stats.stackexchange.com/q/126885/3277). Here is regressional method which is the same as the one used in PCA.

Regression coefficients B to compute Standardized factor scores are: B = inv(S)*A (original S is used) B F1 F2
SLength 1.597852081 -.023604439 SWidth 1.070410719 -.637149341 PLength .212220217 3.157497050 PWidth .423222047 2.646300951

Standardized factor scores = X*B These "Standardized factor scores" have variance not 1; the variance of a factor is SSregression of the factor by variables / (n-1). F1 F2 .194641800 -.365588231 -.660133976 -.042292672 -.786844270 -.480751358 -1.011226507 .216823430 .141897664 -.426942721 1.250472186 .848980006 -.669003108 -.025440982 -.050962459 .016236852 ...

Factors are extracted as orthogonal. And they are. However, regressionally computed factor scores are not fully uncorrelated. Covariance matrix between computed factor scores. F1 F2 F1 .864 .026 F2 .026 .459

Factor variances are their squared loadings. You can easily recompute the above "standardized" factor scores to "raw" factor scores having those variances: raw score = st. score * sqrt(factor variance / st. scores variance).

After the extraction (shown above), optional rotation may take place. Rotation is frequently done in FA. Sometimes it is done in PCA exactly the same way. Rotation rotates loading matrix A into some form of "simple structure" which facilitates interpretation of factors greatly (then rotated scores can be recomputed). Since rotation is not what differentiates FA from PCA mathematically and because it is a separate large topic, I won't touch it.

ttnphns
  • 57,480
  • 49
  • 284
  • 501
  • When you talk about "images" as initial communalities you give a link to another answer of yours (that discusses various methods of choosing initial communalities), but it does not mention "images". It sounds interesting, would you maybe like to expand that old answer? – amoeba Jun 11 '14 at 14:19
  • but factor analysis it seems a bit strange for me,now i am thinking about it and could not guess – dato datuashvili Jun 21 '14 at 20:11
  • I can recreate your initial communalities exactly with

    dat <- subset(iris, Species=="setosa", select = -c(Species))
    X <- scale(dat, scale=FALSE)
    S <- cov(X)
    Rinv <- solve(S)
    D <- diag(1 / (diag(Rinv)))
    init_communalities <- D * Rinv * D - 2 * D + S

    But why is it not instead matrix multiplication D %*% Rinv %*% D - 2 * D + S, which would give a slightly different answer?

    – user1205901 - Слава Україні Mar 29 '24 at 12:28
  • How much different? Point two: one should keep in mind that in textbooks, matrix multiplication notation prevail, for conciseness, even when programmically faster ways to compute the same quantity exist. – ttnphns Mar 29 '24 at 14:06