I have recently faced a weird way of finding a significant variable.
Let the model be
$Cholesterol=Sex+FamilyHistory + Sex*FamilyHistory$
where the * denotes the interaction, sex and FamilyHistory are both two-level factors, and we assume that the study is nested inside cities. Further, assume that the fitted model is a linear mixed model with City in the random effect [random intercept].
Having the model above, we are interested to find out whether Sex has a significant effect on response.
What I normally do is to fit the model and look at the p-value of Sex coefficient in the marginal anova (simply using summary(nlme fit) in R). However, I have been told that I should make a null model like
$Cholesterol=FamilyHistory$
Or maybe! ($Cholesterol=FamilyHistory + Sex*FamilyHistory$)
and compare it with the full model using ANOVA (simply anova(full.model, null.model) in R).
So which one do you recommend?
UPDATE: as @Stefan requested, I give an example in R using the built-in example on lme manual. Sorry that the example is in a different context.
Let the full model be [it is the same as fm2 in the lme example in R]
full <-
lme(
distance ~ age + Sex,
data = Orthodont,
random = ~ 1,
method = 'ML'
)
Now the goal of the analysis is to find whether Sex is significant or not (obviously on the response)!
What I do is to look at the p-value from the sex coefficient from the full model.
r1 = summary(full)
r1$tTable
Value Std.Error DF t-value p-value
(Intercept) 17.7067130 0.83154591 80 21.293729 1.057125e-34
age 0.6601852 0.06209293 80 10.632212 5.740140e-17
SexFemale -2.3210227 0.74306676 25 -3.123572 4.478461e-03
that suggests a significant (in $\alpha =10^{-2}$) deviation for females than males. However, I have been told that the true way of finding the Sex effect is to form a null model by excluding Sex and comparing the resulting model with the full one:
null <-
lme(
distance ~ age , # Here Sex is removed
data = Orthodont,
random = ~ 1,
method = 'ML'
)
anova(full, null)
Model df AIC BIC logLik Test L.Ratio p-value
null 1 4 451.3895 462.1181 -221.6948
full 2 5 444.8565 458.2671 -217.4282 1 vs 2 8.533057 0.0035
As you see the pvalue here ($0.0035$) is slightly smaller than the p-value from the full model ($4.478461e-03$). I understand that the statistics are different, in fact, anova uses the $\chi^2$ distribution and test of coefficients applies $T$ distribution. However, I need to know which one is the proper one!
Sexis significant or not. Those p-values will change depending on what you define as your reference level. Since you are dealing with factors, I agree with @SalMangiafico that you could simply usecar::Anova()to get a table that shows the significance of the individual factors, includingSex. Comparing models usinganova(), tests (as far as I understand) whether the explanatory power of one model is higher than the other... – Stefan Feb 10 '18 at 16:22Sexfor example (full vs. null model) you can figure out whether one model accounts for more variation in the data than the other but not whether the effectSexis significant in itself. For example you could add another covariate (or factor) to the model that increases the explanatory power of the model but it may not show up significant in itself when you look at it in a ANOVA-type table. – Stefan Feb 10 '18 at 16:28anova()can be used to do that, i.e. compare competing models relative to each other with respect to variation explained. ... – Stefan Feb 10 '18 at 18:57Sexout of the model will decrease the variation explained and henceSexshould be kept in the model (as can be seen by the lower AIC, ... , as well as the p-value). I think it depends if you want to know whetherSexin general is an important variable for your model, or whether you want to specifically know whether there is a significant difference in Cholesterol between males and females for which I would docar::Anova(). ... – Stefan Feb 10 '18 at 18:58?anovaunder the heading Value. – Stefan Feb 10 '18 at 19:04anova(null, full)will answer whether addingSexwill significantly increase the variance explained in the model. I am not sure if you can conclude from this test that the factors of sex are significantly different as well... For this I would useAnova()and a posthoc test if you have more than two levels. Maybe someone else here on CV will be able to answer whether using the p-value fromanova(null, full), or from (car::Anova(full)), or fromsummary(full), can all be interpret the same way. I think this is the answer you are after, correct? – Stefan Feb 11 '18 at 15:30anova.lme(full, null)), an F test (anova.lme(full)), or a Wald chi-square test (car::Anova(full))... – Sal Mangiafico Feb 11 '18 at 21:25