Follow-up question on comparing predictor in regressions over two groups

Question

Using a previous Stack Overflow thread as a guide:

I have two groups and I want to see how changes in a variable are predicted by variable 'baseline'. And then I wanted to see if there are group differences in this prediction. Per previous threads to do this, I did something like:

fm1 <- lm(change ~ baseline, DF)
fm3 <- lm(change ~ groups/(baseline - 1), DF)

And then a comparison of the models:

anova(fm1, fm3)

I was wondering if someone could clarify what this comparison actually means. I guess we are comparing a model where equality of coefficients assumed (fm1) vs model where equality of coefficients is not assumed?

EdM · Answer 1 · 2022-06-25T18:34:17.377

First, an answer to how to interpret the models and comparison. Second, a warning based on the names you have chosen for your variables.

First:

I guess we are comparing a model where equality of coefficients assumed (fm1) vs model where equality of coefficients is not assumed?

It's a little more complicated than that. Yes, model fm1 assumes that the slope of the relationship is the same for all groups, but model fm3 includes several extra coefficient estimates. I'll use the sample data from your linked page as an example, where there are 3 age groups and the regression is of weight on height.

DF
#   age height weight
# 1   1     56    140
# 2   1     60    155
# 3   1     64    143
# 4   2     56    117
# 5   2     60    125
# 6   2     64    133
# 7   3     74    245
# 8   3     75    241
# 9   3     82    269
DF$age <- as.factor(DF$age)
fm1 <- lm(weight ~ height, DF)
fm3 <- lm(weight ~ age/(height - 1), DF)

You can see the extra complexity by examining the regression coefficient estimates of the two models:

coef(fm1)
# (Intercept)      height 
# -226.891667    6.108333 
coef(fm3)
#        age1        age2        age3 age1:height age2:height age3:height 
#  123.500000    5.000000   -7.701754    0.375000    2.000000    3.368421

Model fm1 has an Intercept (the estimated weight for the impossible situation when height = 0) and a single slope estimate (the height coefficient).

In model fm3 you have separate estimates for each age group (implicitly for height = 0 in each group). The -1 in the formula for fm3 omits the overall model intercept, so those age coefficients can be thought of as group-specific intercepts. (Although that works in this simple case, you should generally be very careful about omitting intercepts.) You also have separate slopes for height within each group, as desired.

The anova() comparison is whether the improvement in fit in model fm3 is sufficiently better than that of fm1 to make up for the extra 4 parameters that fm3 must estimate.

I find the approaches in the second answer on the page you linked, and in the Cross Validated page linked from that answer, to be superior and more generally useful in more complicated scenarios.

Second:

When I see change as an outcome and baseline as a predictor, I worry. See this page for reasons why. You will generally be better off modeling the actual later outcome value instead of its change.

Follow-up question on comparing predictor in regressions over two groups

1 Answers1