When to remove interaction term in linear model

Question

I have fitted my data to a linear model in R using generalized least squares accordingly:

mod<-gls(Y ~ A * B+Z, 
                         data)

and used weights due to heteroscedasticity as follows:

mod_varpower<-update(object = mod,weights=varPower())

and then I run ANOVA (first question here, lets say A and B are factors and Z is a continuous variable, then I have basically done an ANCOVA right?)

Anova(mod_varpower) # Analysis of variance

which gives me the following output:

> Anova(mod_varpower) # Analysis of variance 
Analysis of Deviance Table (Type II tests)
Response: Y
                Df  Chisq Pr(>Chisq)

A                2 4.9937    0.08234 .
B                2 7.3418    0.02545 *
Z                1 6.3873    0.01149 *
A : B            4 7.9339    0.09403 .

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>

As we can see here I don't have a significant interaction between A and B, according to Engqvist 2005 I should therefore ditch the interaction effect. However, now to my question. If I do look into contrasts of contrasts I get the following:

emmip(mod_varpower, A ~ B) # look at interactions
emm_s.t <- emmeans(mod_varpower, ~A*B) # look at contrasts
emm_s.t
> contrast(emm_s.t, interaction = "pairwise")
      A_pairwise      B_pairwise           estimate   SE  df t.ratio p.value
 Metabolite - Normal  Metabolite  - Normal   -0.171 1.63 140  -0.104  0.9170
 Metabolite  - Low    Metabolite  - Normal   -2.036 1.42 172  -1.430  0.1545
 Cotton - Low         Metabolite  - Normal   -1.865 1.43 186  -1.305  0.1934
 Metabolite  - Normal Metabolite  - Low      -2.448 1.49 138  -1.639  0.1035
 Metabolite  - Low    Metabolite  - Low      -3.246 1.33 185  -2.441  0.0156
 Cotton - Low         Metabolite  - Low      -0.798 1.39 183  -0.574  0.5666
 Metabolite  - Normal Cotton - Low           -2.277 1.47 183  -1.550  0.1229
 Metabolite  - Low    Cotton - Low           -1.211 1.37 156  -0.881  0.3796
 Cotton - Low         Cotton - Low            1.066 1.45 167   0.733  0.4646
Degrees-of-freedom method: satterthwaite

where the 5th row from above shows a p-value below 0.05 which means we have a significant interaction if I'm not mistaken?

So on which basis can we remove the interaction term as proposed by Engqvist 2005? Should I go for the ANOVA output or the contrast of contrasts?

EDIT

If I look at the correlation the two are highly correlated and linear to my understanding. I must mention that stats is not my strong side, some questions might be a bit obvious.

> Mod<-cor.test(dat$Z,dat$Y, method = "spearman",use="complete.obs")
> Mod
Spearman's rank correlation rho


data:  dat$Z and dat$Y
S = 1131356, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.6047263

score 4 · Answer 1 · answered Jul 08 '23 at 13:16

It's risky to make decisions based on the arbitrary p < 0.05 test for "significance." Also, once you start doing that, you get into apparent inconsistencies like what you show.

It's certainly possible for one specific contrast to pass that test even if an overall evaluation doesn't. It's also possible to have an overall test indicate that there are "significant" differences among scenarios while no particular pairwise comparisons are "significant." That has much less to do with the underlying reality and more to do with the different numbers of degrees of freedom used in overall versus specific tests, multiple-comparison corrections, and the arbitrary p-value cutoff.

The decision has mostly to do with what you are trying to accomplish with the model. For a predictive model you typically want to include as many terms as reasonable without overfitting. If that's your purpose in modeling, you would probably include an interaction with a p value of 0.09 (which isn't that far from "significant").

If you are trying to produce a simple model that adequately describes your results, then you might remove it. For example, Frank Harrell suggests pre-specifying interactions when starting to develop a model, but then removing or keeping all interactions based on a global test. See Section 4.12.1 of his Regression Modeling Strategies, with an example of such a choice in Section 21.3.

You have modeled Z as linearly related to outcome. That's a pretty strong assumption and might be affecting the apparent "significance" of your other terms if the true relationship isn't linear. It can help to model continuous predictors flexibly, for example with regression splines. That might be the ultimate solution to your problem.

Exactly! Data analysis is not an algorithm where you can follow some flow chart that says "if this, do this". You have to think.
A surprising number of people object to this. — Peter Flom, Jul 08 '23 at 13:26
As an answer to what I'm trying to do, I'm trying to produce a simple model that describes my results
As for the Z. I've added an edit with a correlation plot for Z and the rho value. As you mentioned in the last paragraph, Z being linearly related to Y is a strong assumption but if I have the correlation plot and literature that shows that Z have a strong positive influence on Y.

Although A and B could also influence Z. Therefore I want to make sure that what I see influencing Y from A and B is actually due to A and B and not Z. I don't know if this makes sense to you. — Blanca, Jul 08 '23 at 18:30
@Blanca what you are trying to do makes sense. You clearly have to adjust for Z in the model. My suggestion is to try something more flexible than a simple linear adjustment for it. The Spearman correlation test you report is non-parametric, only evaluating the ranks of observations. It doesn't indicate linearity. See this page and its links. Even if you did a Pearson test for linear correlation, a curvilinear association of Z with outcome could still have a strong Pearson correlation coefficient. — EdM, Jul 08 '23 at 21:33

When to remove interaction term in linear model

1 Answers1