1

I have a dataset that contains different states of a country. In every state there are different companies and one company in every state is manager of other companies in that state (other companies are branches of this leader company at different levels). I want to normalize (or standardize) this dataset and after that use Factor Analysis to combine different input features to create a single performance indicator.

  • Is is possible to normalize data in every state separately and consider the leading company feature values as denominator of other companies in that state?

  • Can we compare a company from one state with another company in another state in this structure? (compared with using one leading company for whole data).

  • Does this normalization method affect factor analysis assumptions?

** Whole data leading company is so big and has very high value features so I decided to use this normalization structure. Scale and measurement unit of features are different.

mdewey
  • 17,806
user2991243
  • 4,251

1 Answers1

2

First off, the terms normalize and standardize are both used variably and even unpredictably across different branches of statistical science, and beyond, so bitter experience teaches me that you cannot be confident about what is meant unless the equation, or equivalently the computer code, being used is visible or documented.

It sounds as if you want to scale each company so that some measure becomes (value for company)/(value for "big" company in its state). You can do that, but inevitably you set aside thereby the absolute values concerned. Comparisons, particularly between companies in different states, are therefore made more complicated as much as they are made easier. For example, let's say that $A$ is an arbitrary company and $B$ is the big company that is a reference, so that your measure is $A/B$. Then it is easy to see that (e.g.) in one state $A/B$ could be $2/10= 0.2$ and in another state $A/B$ could be $4/40 = 0.1$. Hence, without paradox, $A$ is absolutely bigger in the second state but relatively bigger in the first state. Whether this is what you want is a matter of your substantive goals, which are not evident and in any case likely to beyond the scope of this forum.

It is very hard to say much about the consequences for factor analysis. Your scaling is just a linear scaling, but a different linear scaling for each state, so all depends on the detail. It could make matters worse or better. There is certainly no sense in which linear scaling is guaranteed to make data behave better. (The literature is muddy here, if only because "normalize" is often used for some transformation designed to bring a distribution closer to the normal, which is often (rightly or not) thought a good idea for methods like factor analysis.)

In general, I often see people in this forum reaching for something like this. My instinct is that often it's a lot simpler in the long run to keep with the original measurements, which are, or should be, on scales you should understand substantively (as a currency amount, a production total, number of employees, or whatever it is). Scaling like this, then analysing on that scale, and then somehow trying to interpret results correcting for what you have done can be a very roundabout style of analysis. It can be true that measures on different scales can be difficult to compare, but there are many methods that offer a solution there.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
  • Thank you Nick for your detailed answer. As you mentioned there are many methods that offer a solution here for different measurement units and scales. What are these methods? I think normalization is pre-assumption of usual factor analysis. Is this true? – user2991243 Nov 17 '15 at 12:09
  • 2
    I think even experts on factor analysis would need definitions of "usual factor analysis" and "normalization" to answer your last question; that was a point I made in the answer. I sometimes use principal component analysis on a correlation matrix when faced with different units of measurement, but often I use it to find structure that is then often best shown in quite different ways. But much, much more commonly I just use a regression-type model, so that units of measurement become incidental. – Nick Cox Nov 17 '15 at 12:17
  • Thank you again. For more information I'm using factor analysis with correlation matrix (not co-variance matrix) and rotation method of Varimax and using Principal components as main method (mentioned in SPSS software). My normalization method here is just using leading company as denominator of other companies in that state. – user2991243 Nov 17 '15 at 12:21
  • 1
    The only direct way I know to assess the effect of your scaling is to try your multivariate analyses on scaled and unscaled data and think about the results substantively. As you are dividing by different reference values in each state, the correlation structure will be different. As above, it doesn't help to refer to normalization without a precise definition alongside. – Nick Cox Nov 17 '15 at 12:25
  • There are seemingly limitless ways to rescale data. The biggest issue with scaling in PCA is scale invariance. E.g., if you're using the typical default of OLS estimation of the components then inputs with larger moments (e.g., mean or standard deviations) will load more heavily on the solution, distorting it in the process. One approach is to transform the continuous inputs to a mean of 0 and an SD of 1. Another is to use maximum likelihood, which is scale invariant by definition. A third is to use finite mixture model approaches such as latent class models for component estimation. – user78229 Nov 17 '15 at 14:25
  • 1
    @DJohnson Some very puzzling comments there on PCA, i.e. principal component analysis, which at least sensu stricto, has nothing to do with ordinary least squares; there is no error term and arguably not even any estimation, as the entire beast can be regarded as a multivariate transformation. It's certainly true that if you input a covariance matrix to PCA, then results are utterly dependent on the sizes of the SDs, but the means wash out. Try an experiment by adding constants and shifting the means, and see what changes (nothing). – Nick Cox Nov 17 '15 at 14:43
  • 1
    (continued) Perhaps you have in mind some variant of factor analysis with a principal components starting point, for which bets are off until you know the precise witches' brew in particular software. – Nick Cox Nov 17 '15 at 14:44
  • @NickCox Fair enough. But is it true that there is no residual in PCA? Moreover, the method of estimation (e.g., OLS vs ML) seems one of the key decisions in the process of finding components, whether PCA or FA. So, I'm having trouble parsing your statement that there "is no estimation" and am interested in any elaboration on that. – user78229 Nov 17 '15 at 15:11
  • 1
    There is no error term, so residuals are not defined. There aren't any parameters to estimate: you are merely mapping to a different space. This is easiest to see when calculating principal components of 2 variables; it's just a matter of a different set of coordinates, but the data points remain as they were; they are not being reconstructed. I guess you've been let down by literature that muddies the difference between PCA and FA, including some software manuals. – Nick Cox Nov 17 '15 at 15:34
  • 1
    Indeed, it is difficult to be impartial here. PCA is regarded by many FA advocates as just an uninteresting limiting or degenerate case of FA. FA is regarded by some PCA advocates as an unnecessary complication of PCA, with alien ideas such as latent variables. There are people and books who span the range, but they are few. – Nick Cox Nov 17 '15 at 15:35
  • @NickCox To your point, it is very easy to confound PCA and FA. In addition, you are quite right to remind us (me) that PCA does not have an error term. Another key distinction worth noting between these two techniques (and that goes unmentioned more times than not) is that whereas PCA produces mathematically unique solutions, FA has infinitely many. Thanks for the tutorial. – user78229 Nov 18 '15 at 13:04
  • I naturally agree in spirit. But the "uniqueness" of PCA results is strictly qualified in one precise sense, in so far as the signs of results are quite arbitrary, so comparing the results of one program with another, positive and negative can be flipped. More at e.g. http://stats.stackexchange.com/questions/88880/does-the-sign-of-pca-or-fa-components-have-a-meaning This is not problematic so long as you know about it. – Nick Cox Nov 18 '15 at 14:45
  • Another interesting nuance! – user78229 Nov 19 '15 at 10:19