2

I have the following (simple?) question about statistics: I have a dataset where I look for correlations between variables and would like to control for differences between factor levels. For visualization, consider this example: N = 100 people perform 20 tasks and the time taken for each task serves as performance measure. I now wish to correlate the performance in these tasks with the people's IQ, which is also known. Here it seems reasonable to somehow account for differences between the tasks, and two ways to do this come to mind:

  1. I first z-standardize the performance measures in each task, then compute a simple correlation with IQ (possibly after aggregating data across tasks)
  2. I allow for a random intercept for each task in a linear mixed model predicting IQ Are these approaches essentially identical or do they differ in any way?

(A similar topic was discussed here: Using z-standardization to account for covariate, but does not become clear in that discussion if options 1 and 2 in the example here can be seen as equivalent)

MR13
  • 21

1 Answers1

0

No, the approaches are not the same.

  • $z$-scaling subtracts the mean and divides by standard deviation. Intercepts (random or fixed) only correct for the mean, they don't scale the data anyhow.
  • Intercept is the mean when controlling for other variables in the model. It is not necessary the same as the arithmetic mean of the sample. This applies to the rest of the procedure as well: correlation does not control for other variables.
  • With random intercepts, they are shrunk towards zero (as with $\ell_2$ regularization or using Gaussian prior with mean of zero). Again, they won’t be the same as if you calculated arithmetic averages per each group.
Tim
  • 138,066
  • Thank you! Could z-scaling thus be seen as more meaningful in the given example because it accounts for different standard deviations in the tasks' measurements (and there are no additional variables for which random intercepts could control)? – MR13 May 02 '22 at 09:06