Comparing models with main effects and interactions

Question

I have two models:

Model 1: Only contains independent variable $x$, while $x$ is non-significant.
Model 2: Contains $x$, $m$, and $x * m$, and $x * m$ is significant.

How could I illustrate this result? Can I still say that $m$ has a moderating effect? If so, how can I determine the direction of $m$'s effect?

How best to illustrate this depends on what type of regression. OLS? Logistic? Something else? Also, are x and m categorical or continuous or what? — Peter Flom, Jan 08 '24 at 11:15

score 7 · Answer 1 · answered Jan 08 '24 at 08:24

7

Ignore "significance". Pre-specify the full model, heavily informed by subject matter knowledge. Get point and interval estimates of interest through pre-specified contrasts. Use simultaneous contrasts if you want multiplicity adjustment. Main effects are single-difference contrasts and interaction effects are double-difference contrasts. See here for multiple examples using the R rms package contrast.rms function. Note that for interaction contrasts, contrast() requires 4 list() arguments to specify the predictor variable settings to double difference.

answered Jan 08 '24 at 08:24

Frank Harrell

91,879
6
178
397

thanks so much! – david Jan 09 '24 at 05:08
This is great. I want to make my whole faculty read this !! But I have a question: when you say "Pre-specify the full model", what do you mean by "full" - do you mean the largest model (including interactions, splines etc) as dictated by prior knowledge ? Not just every term that we can think possibly of ! ? – Joe King Jan 09 '24 at 12:05
It's hard to be extremely specific without being Bayesian, because a Bayesian modeler would use the fullest model but put skeptical priors on interaction terms, thereby reducing their effective number of degrees of freedom and reducing sample size requirements. But in general the full model includes all the terms that the investigator cares about and thinks might be non-zero. – Frank Harrell Jan 10 '24 at 07:08

Shawn Hemelstrand · Answer 2 · 2024-01-08T21:56:02.277

When It Matters

I agree with Frank that statistical significance really shouldn't be the major criterion here for interpreting your data. If we simply step away from the statistics for a second, why would this interaction make sense? What is driving this interaction? Sometimes interactions are found by chance. Other times it can be because of complex phenomenon. You need to consider this first before you even go forward with interpretation.

Let's say we have a scenario where we obtain the measured temperature on a given day and the caloric intake of some frogs for that same day (see example of frog diet below...).

Sadly, frogs don't always get to live to have another meal. We decide to record the number of frog deaths that day given the number of flies they eat and the temperature outside. There is likely a lot better construction of this experiment (e.g. survival rates over time), but we can just keep the example simple for now. Let us say that during the course of measuring the frogs, we noticed that temperature doesn't really have any practical effect (frogs generally don't die from temperature alone), nor does food intake (I'm not a frog expert, but lets just assume they're efficient with calories).

However, after getting our measurements, our data ends up showing that the combination of high temperature and low caloric intake results in a substantial number of frog deaths. We could perhaps determine, with trepidation as an observational study, that the relationship between temperature and caloric intake is driving the frog deaths, but the influence of one or the other alone doesn't make a difference. It is surmised that frogs exposed to extreme heat and a lack of proper nutrition together causes higher death rates and this is then reported to the scientific community. This is the key to Frank's statements about subject matter knowledge. The context of the question will matter, and whether or not our regression fits the mold can only be rightfully determined by that knowledge.

See here that I have not highlighted statistical significance at all, a point I will elucidate as potentially misleading in the next part of this answer.

When it Doesn't Matter

We can easily simulate a similar scenario where the main effects are statistically non-significant and the interaction is significant. By creating very tiny effects, we can also show that this distinction may not really matter. Below I simulate some data in R to mimic this:

#### Simulate Data ####
set.seed(123)
x <- rnorm(1e4) # normally distributed predictor
z <- rnorm(1e4) # normally distributed control
Create Beta Weight and Error
b0 <- 0 # intercept
b1 <- .0001 # slope 1
b2 <- .0001 # slope 2
b3 <- .05 # interaction 
e <- rnorm(1e4) # normal error term
y <- b0 + (b1x) + (b2z) + (b3 * x * z) + e # linear construction of y
df <- data.frame(x,z,y)

Note a few things before we fit the regression:

The sample size is large ($ n = 10,000$), so we should have excellent power to detect even minor effects.
The $\beta$ weights, the intercept and slopes, are quite tiny $\beta = .0001$, and I have only adjusted the interaction to have slightly more magnitude to flag our significance tests ($\beta_3 = .05$).

We fit the regression in R below:

#### Fit Regression ####
fit <- lm(y ~ x*z)
summary(fit)

With the following output:

Call:
lm(formula = y ~ x * z)
Residuals:
    Min      1Q  Median      3Q     Max 
-3.9996 -0.6647  0.0066  0.6733  3.8627
Coefficients:
             Estimate Std. Error t value Pr(>|t|)

(Intercept)  0.006126   0.010077   0.608    0.543

x           -0.012349   0.010091  -1.224    0.221

z            0.004928   0.010062   0.490    0.624

x:z          0.053967   0.010239   5.271 1.39e-07 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.008 on 9996 degrees of freedom
Multiple R-squared:  0.002928,  Adjusted R-squared:  0.002629 
F-statistic: 9.784 on 3 and 9996 DF,  p-value: 1.926e-06

Some things should already stick out:

The interaction is the only "significant" term here, but we must consider scale here for all of the betas. If we consider our frog example, with caloric intake and degrees Fahrenheit as predictors, are these effects practically meaningful?
The adjusted $R^2 = .002629$, which means the model has very little explanatory value despite the model being statistically significant.

Plotting the data can of course elucidate this point. If we simply look at the scatters of $x$ and $z$ on $y$, we can see these two giant clouds of data which seem to have essentially no association with the outcome:

What if we check the interaction between $x$ and $z$? We can do this by first creating some simple slopes:

#### Create Sequence of Values Controlling for +/- SD and Mean of Z ####
newdata1 <- data.frame(
  x = seq(
    min(x),
    max(x),
    length.out=200 # sequence of data to get prediction line for x
  ),
  z = mean(z) - (1*sd(z)) # 1 SD below mean of z
)
newdata2 <- data.frame(
  x = seq(
    min(x),
    max(x),
    length.out=200
  ),
  z = mean(z) # mean of z
)
newdata3 <- data.frame(
  x = seq(
    min(x),
    max(x),
    length.out=200
  ),
  z = mean(z) + (1*sd(z)) # one SD above mean of z
)
Get Prediction Data
pred1 <- predict(fit,newdata = newdata1)
pred2 <- predict(fit,newdata = newdata2)
pred3 <- predict(fit,newdata = newdata3) # predicts model with new data

Then we plot the scatterplot of $x$ and $y$ again, only this time we overlay the predictions when $z$ is at its mean or one standard deviation above/below that mean (indicated by the red lines):

#### Plot Interaction ####
par(mfrow=c(1,1))
plot(x,y,main="Interaction Plot")
lines(newdata1$x,
      pred1,
      col="red",
      lwd=5)
lines(newdata2$x,
      pred2,
      col="red",
      lwd=5)
lines(newdata3$x,
      pred3,
      col="red",
      lwd=5)

Shown below:

Now we can determine some additional facts:

While our interaction is statistically significant, the plotted lines show the interaction isn't incredibly strong (they almost overlap).
Looking at the axes of the plot, the scale here really matters. Note here that the values only range between about $[-4,4]$. Low-scale values can be considered large in some contexts (for example if $x$ is BAC content) or essentially meaningless (if $x$ is income in the thousands). The context and research question will determine if these effects and interactions are truly meaningful.
We still don't know from our simulated example what this means. I have purposely left out what this simulation was supposed to emulate to illustrate an important point: what we are measuring should be important to answering questions of scientific importance. If we don't have any background knowledge driving that inference, then it is hard to determine if it is indeed important. Something like statistical significance has an otherwise limited role in answering those questions.

Model Comparison

Most of this discussion did not really highlight what to do in terms of model comparison. Consider for our simulated example if we fit the models with and without an interaction:

#### Fit Models ####
fit1 <- lm(y ~ x + z)
fit2 <- lm(y ~ x * z)
summary(fit1)
summary(fit2)

As shown here:

> summary(fit1)
Call:
lm(formula = y ~ x + z)
Residuals:
    Min      1Q  Median      3Q     Max 
-3.5771 -0.6799 -0.0077  0.6883  4.3606
Coefficients:
             Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.006785   0.010013  -0.678    0.498

x            0.020496   0.010027   2.044    0.041 *
z           -0.003838   0.009998  -0.384    0.701

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.001 on 9997 degrees of freedom
Multiple R-squared:  0.0004315, Adjusted R-squared:  0.0002316 
F-statistic: 2.158 on 2 and 9997 DF,  p-value: 0.1156
> summary(fit2)
Call:
lm(formula = y ~ x * z)
Residuals:
    Min      1Q  Median      3Q     Max 
-3.4854 -0.6830 -0.0075  0.6905  4.3599
Coefficients:
             Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.007047   0.010005  -0.704   0.4812

x            0.020198   0.010018   2.016   0.0438 *

z           -0.003170   0.009990  -0.317   0.7510

x:z          0.044362   0.010166   4.364 1.29e-05 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1 on 9996 degrees of freedom
Multiple R-squared:  0.002332,  Adjusted R-squared:  0.002033 
F-statistic: 7.789 on 3 and 9996 DF,  p-value: 3.43e-05

Comparing these two models by their significant terms doesn't really say much. What we would really need is some kind of other determinations, like AIC/BIC or some chi-square test. We can quickly check these to see if this interaction term is indeed important to include:

#### Check Model Comparison ####
anova(fit1,fit2, test = "Chisq")
AIC(fit1)
AIC(fit2)

The chi-square test is unsurprisingly statistically significant because of the sheer degrees of freedom present from the number of data points:

Analysis of Variance Table
Model 1: y ~ x + z
Model 2: y ~ x * z
  Res.Df   RSS Df Sum of Sq  Pr(>Chi)

1   9997 10023

2   9996 10004  1    19.057 1.278e-05 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, our AIC values have very little difference here:

> AIC(fit1)
[1] 28409.57
> AIC(fit2)
[1] 28392.54

So again, we need to consider the data, the theory, and other important factors when comparing such models even in a strictly statistical way.

thank u so much for your detailed analysis! this does help me a lot! — david, Jan 09 '24 at 05:07
Great! Feel free to accept one of the answers here if it answered your question (by clicking the checkmark next to the answer). — Shawn Hemelstrand, Jan 09 '24 at 10:47
Note that the chi-squared test you do tests the same hypothesis as the $t$-test for the x:z coefficient in the summary(fit2) table. More precisely, it is equivalent to the corresponding two-sided $Z$-test, i.e., a two-sided $t$-test with infinite degrees of freedom. — statmerkur, Mar 23 '24 at 10:26

Sextus Empiricus · Answer 3 · 2024-03-23T09:12:49.687

Adding a factor to a model can change the estimates and confidence intervals of the present factors. An example that is easily illustrated is the case of an intercept term. For example: Why do my (coefficients, standard errors & CIs, p-values & significance) change when I add a term to my regression model?

A simple example for a change in the main effect when an interaction term is added is a model like

$$y_i = a+b x_iz_i + \epsilon$$

Such model has only an interaction term and no main effect. So when you fit your model 1, you may observe a significant main effect because the main effect can correlate with the interaction term, and this 'disappears' when we include the interaction term. The interpretation of the main effect also changes when we include an interaction term.

In the figure below you can see an example.

With a main effect only the model fits reasonably and the coefficient for $x$ relates to the average slope.
With the interaction term included the model fit is improved. But now the interpretation of the main effect changes and the coefficient for $x$ relates to the slope when $z=0$. In the example this slope is very close to zero and 'not significant'.

R-code for the image:
set.seed(1)
n = 200
x = runif(n,0.5,3)
z = sample(6:9,n,replace = 1)
y = 30+2xz+rnorm(n)
plot(x,y, col = z-5, pch = 20, main = "fit with lm(y ~ x)", xlim = c(0,3.5), c(30,80))
legend(0.5,75,9:6, col = 4:1, pch = 20, title = "z")
xs = seq(0.5,3,0.01)
b = lm(y ~ x)$coef
lines(xs, b[1]+b[2]*xs)
plot(x,y, col = z-5, pch = 20, main = "fit with lm(y ~ x*z)", xlim = c(0,3.5), ylim = c(30,80) )
legend(0.5,75,9:6, col = 4:1, pch = 20, title = "z")
b = lm(y ~ xz)$coef
lm(y ~ xz)
for (z in 0:9) {
  lines(xs, b[1]+b[2]xs + b[3]  z + b[4] * xsz)
  text(3, b[1]+b[2]3 + b[3] * z + b[4] * 3*z, 
        paste0("z = ",z), pos = 4)
}

You can still call one of the variables a moderator, which one that depends on your viewpoint (One of the main effects not significant, but interaction term significant)

Also note that by translation of the variables we can change the specific slope that is regarded as the 'main effect'. The interpretation of the 'main effect' is subjective and often it doesn't matter whether it is significant or not when the interaction is significant.

Comparing models with main effects and interactions

3 Answers3

When It Matters

When it Doesn't Matter

Create Beta Weight and Error

Get Prediction Data

Model Comparison

Linked