2

I have been looking at structural covariance matrices (correlating grey matter volume across many regions (all continuous variables, residualized for covariates). I have created this matrix and noticed several significant correlations (after correction for multiple comparisons), and now I would like to see if these correlations are associated with a third variable (disease severity, a continuous variable). How can I do this? I have been thinking of several ways:

  1. Averaging all possible two-by-two correlations and correlating these averages with disease severity.

  2. Running linear regressions for each possible correlation interacting with disease severity in the model (though I think i should then run double the amount of regressions as I guess I should test switching X and Y)

  3. I am new in matrix mathematics, but I thought matrix multiplication could help me. But I am not sure if this would be correct, as maybe I am correlating brainvolume1disease severity with brainvolume2disease severity. This would contain disease severity twice in there and maybe might be collinear.

Any other suggestions?

utobi
  • 11,726
HIL
  • 111
  • 10
  • There are some other approaches to quantifying "correlation" with a third variable: https://stats.stackexchange.com/questions/588968/why-is-correlation-only-defined-between-two-variables/589221#589221 – Galen Sep 25 '22 at 03:42

1 Answers1

0

As I understand You have a table, where in columns You have N columns each for one region with grams of grey matter per gram of brain. N+1 column is a disease severity. Rows are different Kpatients. The first question you had is whether your K points lie on an line in N dimensional space. You've done it for pair-wise comparisons, dissecting this N-dimensional space to N(N-1)/2 planes. I would suggest You to skip this step and to stay in N-dimensional space. You may just perform a principal component analysis. Significance may be evaluated as described here and here. Second link is not in open access, but it is similar with the first, moreover, I have it.

Thus, you will have (let's call it grey matter index) a first principal component. Evaluation of loadings stability and their significance will give You 'noisy' regions (their permutation did not influence on explained by component variance), 'positive significant loadings' and 'negative significant loadings'. Variables from significant groups are positively correlated inside their group and negatively correlated with variables from other group. Now you may have a look on a correlation between grey matter index and disease severity. If You will rank data for each column, then it will be a good idea just perform PCA in original N+1 (including disease severity) dimensional space. This computational approach takes some time, comparing with classic analytic solutions, but it takes computer time- not Yours. And You will-not loose a lot of information during ND space dissection and multiple comparison corrections.

zlon
  • 718