1

The Exploratory Factor Analysis has the following mathematical formulation as in the screenshot from wikipedia (https://en.wikipedia.org/wiki/Factor_analysis): enter image description here

That means the factors in F are centered to zero and the mean values of the observations are in X.

For my analysis, I would like to compare the resulting factors including the mean values, such that they may be compared in their magnitude with respect to X.

This is:

X = L * G + e

where G would contain the factors including the mean values aka "uncentered factors scores"?

Does this make sense and if so how is it possible?

It occurred to me that in implementations such as https://factor-analyzer.readthedocs.io/en/latest/_modules/factor_analyzer/factor_analyzer.html#FactorAnalyzer.transform doing so is not an valid option.

gnm
  • 31
  • FA or PCA analysis is done on a correlation or a covariance matrix. This implies centering of the data before the analysis. Though it is possible to perform FA/PCA on raw data, this usually makes little sense (1, 2). So, what may be the reason to supply a mean (to de-center) to the resultant factor/component scores? Of course, you may add a mean you think is proper, but FA/PCA analysis cannot suggest one since it never uses one. – ttnphns Jul 20 '22 at 15:52
  • Thank you ttnphns for your reply and points. I'll look through the referred post in detail. I'm not sure how I could compute the "uncentered factor scores" (i.e., G from the question above) in a post-processing step from a mathematical and statistical correct way. A practical approach could be to numerically approximate G in the equation X = L * G + e, given the L matrix from the FA (using the centered observations). However, doing so does not seems not to be mathematically exact and correct. – gnm Jul 20 '22 at 16:13
  • @ttnphns you wrote that it is possible to perform FA on raw data. How it this done? As you wrote "A is done on a correlation or a covariance matrix and this "implies centering of the data before the analysis". – gnm Jul 22 '22 at 08:38
  • It is possible to base the analysis on the raw sscp matrix. But, mind 1) The results can be meaningless (see the first link in my initial comment), 2) Most existing FA functions will not allow you to "turn off" the centering, so you probably will have to code the function yourself. – ttnphns Jul 22 '22 at 09:35
  • In https://stats.stackexchange.com/a/16335/3277 I'm showing pictures of PCA performed with alternative point of rotation than the mean. That might make sense with binary data. However, FA is not just PCA and is not recommended with binary data. – ttnphns Jul 22 '22 at 09:40
  • Thank you ttnphns. With the help of the pictures and explanations from your referred posts, I guess that it makes sense to use the (usual) factors scores using the centered input data and to address this way my use case. I posted these considerations below. – gnm Jul 22 '22 at 10:48

2 Answers2

0

In your second referred post (PCA scores in a for portfolio replication task: stumble over mean-centering question), @ttnphns, you provide an example with centered Principal Components (denoted as PC) and decentered Principal Components (denoted as dePC) .

The there provided data matrix also exemplifies my intended use case. In my use case, the higher V1 or V2 are, the better. Attaching a screenshot with the data matrix from the example and with the higher of the two variables per row, i.e. either V1 and V2 or PC1 and PC2 or dcPC1 and dcPC2, marked with a green background.

enter image description here

We can see that while the decentered components dePC1 and dePC2 seem to reflect the magnitude from V1 and V2 (given how the load on them), more closely than the centered PC1 and PC2.

For the Factor Analysis I'm still not sure how to compute the uncentered Factors mathematically correctly.

gnm
  • 31
0

For the use case of interpreting the resulting factor scores with a "the higher the better" interpretation, we could use the (usual) factors scores using the centered input data.

The resulting factors scores using the centered input data are on the same scale regarding their mean values and variance. Therefore, we can compare the direction and magnitude of the factor scores directly with regards to the loading matrix.

The sign of the factor scores and loadings can jointly change for each factor without impacting the outcome of the factor analysis but only the interpretation of the factor scores and loadings. Therefore, we may determine the signs of the loadings by setting the absolutely largest loading of a factor to a positive sign and changing the signs of the other loadings of this factor if necessary accordingly. Ideally, the (absolutely large) loadings of a factor all have the same sign, in our case a positive sign. If they do, we can interpret the factor scores for a single observation in terms of their size, in our our case with the interpretation that more positive scores are more relevant to "explain" the positive values of the observation.

gnm
  • 31