2

My question is a bit long, with 2 major parts. Here are the variables:

  • Number of cells (C): main dependent variable
  • Disease severity 1 (D1): continuous
  • Disease severity 2 (D2): continuous but only quantifiable on diseased organ
  • Age Sex
  • Organ side: L or R
  • Lateralization (L or R = 1, L and R = 2)
  • Location of disease in organ
  • Concurrent Disease 1, Concurrent Disease 2
  • N=208

We are trying to reproduce a previously published paper that found a significant association between C and D1. The disease can be present in L, R, or both. Age is a confounding factor because C normally decreases with age. Both L and R organs are entered in the database as their own lines if both organs are affected, and only one line if only L or R is affected. Each line contains the data of both organs and have the Lateralization variable. We followed the previously published statistics and found directly conflicting evidence, and we want to show that D1 and D2 are not related to C.

ANALYSIS A.

The analysis we reproduced is as follows:

  1. In entire cohort of both Lateralization 1 and 2: With age as a covariate, partial correlation between: C & D1, C & D2.

    In cohort of only Lateralization 1:

    1. With age as a covariate, partial correlation between: C & D1, C & D2, and difference in C between diseased and nondiseased organ vs difference in D1 between diseased and nondiseased organ
    2. Paired t-test to compare C in diseased organ vs. non-diseased organ
    3. In patients with D1 < 2 (arbitrary cutoff by previous authors): Pearson correlation between C, D1, D2, age 4.

    In all Lateralization 1 patients only, subdivided into groups of D1 <2 and D1 ≥2 D: Mann Whitney U tests for age, D1, D2, C

All aforementioned steps repeated for Patients without disease 1, and without disease 2 separately (not looking for interaction between these diseases)

ANALYSIS B.

However, I thought I could also do 2 hierarchical multiple regressions, both with C as the dependent variable. The blocks would unfold as follows:

  1. Age, Sex,
  2. Lateralization location of disease in organ,
  3. Concurrent Diseases 1 and 2,
  4. D1

and

  1. Age, Sex,
  2. Lateralization location of disease in organ,
  3. Concurrent Diseases 1 and 2,
  4. D2

ANALYSIS C.

I did a partial correlation and put all variables from block 1-3 from the regressions with my IVs as D1 and D2, with dependent variable C. I read somewhere that a partial correlation is only good for 3 covariates?

Which is a better analysis to report?

If the multiple regressions are better to report, I have an issue with my results. My regressors are nonsgnificant, which is what we want to confirm. But, ANOVAs for each model are p < 0.01. I ran VIF and all of my variables have VIF < 1.5.

EDIT:

here is my output

OUTPUT

EDIT:

changed order of predictors

enter image description here

Jenny H
  • 21
  • 1
  • I have reviewed the answers to this questions:

    "One conclusion we can draw from this is that when too many variables are included in a model they can mask the truly significant ones."

    "Even if you had no multicollinearity, you can still get non-significant predictors and an overall significant model if two or more individual predictors are close to significant"

    • I have done the hierarchical regression analysis with only 2 blocks, with 1 predictor in each (age which is a huge confounder, D) and still get the same results for my hierarchical regression..
    – Jenny H Nov 11 '16 at 16:50
  • If some people give rise to two rows in the data-set then have you taken that into account? I wonder whether it would help us if we could see either (a) the first few rows of your data-set (b) some output from the multiple regressions, or both. – mdewey Nov 11 '16 at 17:00
  • @mdewey, some people give rise to 2 rows because they have both organs affected by the disease. The data will be the same in both rows except for a few categorical variables that I use to code my syntax. Therefore, if the row corresponds to the left organ, my syntax assigns the variables of interest to the left organ and ignores the right. – Jenny H Nov 15 '16 at 14:24
  • If I read your output correctly you do have at least one predictor which is statistically significant (age). Is age strongly related to the predictors you are interested in? – mdewey Nov 15 '16 at 14:32
  • The two organs are presumably not independent so you ought to take that into account in your analysis. That is unlikely to help with the problem which is vexing you though. – mdewey Nov 15 '16 at 14:33
  • Older patients will develop concurrent disease 2.. – Jenny H Nov 15 '16 at 17:34

1 Answers1

1

From your output, it seems that you might be placing too much importance on a result that didn't pass an arbitrary statistical significance cutoff yet might still be consistent with the previously reported results.

Note the extremely wide 95% confidence limits for the D1 coefficient in your regression: from -9.7 to +68. Yes, the p-value of 0.14 could be interpreted to mean that D1 is not statistically significant in this data set. But is the coefficient reported previously by others within the confidence limits that you found? Is your point estimate of the coefficient within the confidence limits that were previously reported? If so, then your data do not really refute the prior result that D1 is related to C. Perhaps your sample was simply too small to document that (possibly weak) relationship reliably.

Edit after seeing the prior paper and additional results:

The prior paper seems not to have done a very thorough job of controlling for covariates, instead performing a set of individual correlations. There is little question that ANOVA or multiple regression, as you have performed, is a better approach. Note that you can't "show that D1 and D2 are not related to C" in this way, but the confidence limits on their coefficients in multiple regression will document the issue at hand.

You have to be careful in what you mean by saying your "ANOVA is significant," as your tables show two different types of results.

The Model Summary tables represent the differences between models as predictors are added sequentially. So the p-values indicate whether each additional model reduces variance significantly from its predecessor. In each case where the model is augmented by adding the D1 (Severity measure) predictor, the corresponding p-value is about 0.13 or 0.14, as it is for D1 in the multiple regression. Not statistically significant with this data set.

The significant ANOVA results that trouble you seem to be those presented in the tables labeled "ANOVA." Yes, the models that contain D1 are significant. But these are tests of a model with all of the specified variables against a model with no variables, as the associated degrees of freedom indicate. That just tells you that the combination of included predictors is significantly better than nothing, not that any individual predictor is "significant." It seems that all models significant in the tables labeled "ANOVA" include age, a significant predictor in the multiple regression. Adding D1 wasn't able to reduce the model to insignificance, but that doesn't necessarily mean that D1 itself is "significant."

I still caution, however, that your work does not rule out a weak possible contribution of D1 to cell count. You certainly have, however, shown that age is an important predictor.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you for your response. Unfortunately the previous report only reported a bivariate correlation so I am unable to compare the confidence limits.. I repeated exactly what the previous report did (in terms of correlations) and found no significant correlation. In addition, the previous paper had N~90 whereas we have 208. – Jenny H Nov 15 '16 at 14:26
  • @JennyH a link to the paper whose results you are trying to reproduce would be helpful; it would best be added to the text of your question. – EdM Nov 15 '16 at 15:18
  • The Pubmed ID: 24858021 Thank you! – Jenny H Nov 15 '16 at 17:32
  • I would like to note: the previous authors didn't control for concurrent diseases 1 and 2. I have a draft that reproduced their analyses and found no significant results but I also feel like I should be reporting a regression for the sake of correct statistical analyses. – Jenny H Nov 15 '16 at 17:39
  • Seeing the prior paper with its poor control for covariates helped explain a lot. Just note that, for a strict comparison, the prior paper excluded individuals with disease in both eyes, I suppose so that they could test astigmatism differences between affected and unaffected eyes. I assume you have some way of addressing that issue. You can't really "show that D1 and D2 are not related to C" but you can set limits on the relations and document the contributions of the other covariates that were not so well addressed in the prior paper. – EdM Nov 15 '16 at 18:29
  • Thank you for your response! The previous paper didn't exclude the concurrent disease 1 (diabetes -- systemic disease) or 2 (cataract). Their measure of severity was astigmatism, so in a unilateral patient population they compared the difference in astigmatism and all other variables with GROUP analyses. In the beginning we chose to do a partial correlation with paired t-tests.. Then I was reading that partial correlations shouldn't be used when there are too many covariates. That's when I decided to go ahead with the regression. – Jenny H Nov 15 '16 at 19:13
  • My thinking is this: if the patient has the disease in both eyes and astigmatism is their severity measure, eyes can be considered independent because they will both have astigmatism especially because astigmatism and cell count are continuous variables. We should see that a patient's more severe eye will have more severe astigmatism. – Jenny H Nov 15 '16 at 19:14
  • We intend on documenting the contributions of our covariates but we also strongly believe that pterygium severity is not associated with lower cell counts. Our analyses have been presented at multiple conferences and has received a lot of support.

    I do think it's relevant and appropriate to perform a regression but I can't get over the fact that my ANOVA is significant..

    – Jenny H Nov 15 '16 at 19:14
  • I have added my output with concurrent diseases first -- only adding one covariate at a time. – Jenny H Nov 15 '16 at 19:25
  • thank you for your input. Since age is the largest confounding variable, is it sufficient to say that with a partial correlation controlling for age, cell count and disease severity were not correlated? r=0.078, p=0.265 – Jenny H Nov 16 '16 at 20:31
  • I would prefer to express the result in terms of the multiple regression coefficients, as that makes it easier for others to compare their results against yours and gives more useful data for subsequent meta-analysis. – EdM Nov 17 '16 at 01:11