2

I have conducted OLS with interaction terms in R and used emmeans package to examine further. I am not sure whether I am on the right track to interpret the interaction terms.

  • My research question: Does depression moderate the association between abuse and delinquency?
  • Hypothesis: Abuse experience will have a greater negative impact on delinquency for people with high levels of depression compared to those with low depression.

IV: abuse (yes=1, no=0); Moderator: depression (high depression=1, low depression=0); DV: delinquency (continuous)

Here are my codes:

reg <-lm(delinquency~ abuse + age + depression + NBsafety +abuse*depression, data=test)
summary(reg)

reg_a <- emmeans(reg, ~abuse*depression) reg_a

contrast(reg_a, "revpairwise", by="depression", adjust="none")

emmip(reg, depression~abuse, CIs=TRUE)

reg_b <-emmip(reg, depression~abuse, CIs=TRUE, plotit=FALSE) reg_b

p <-ggplot(data=reg_b, aes(x=depression, y=yvar, fill=abuse)) + geom_bar(stat="identity", position="dodge") p

And, I got these results:

Call:
lm(formula = delinquency ~ abuse + age + depression + NBsafety + 
    abuse * depression, data = test)

Residuals: Min 1Q Median 3Q Max -8.494 -3.130 -1.424 1.780 28.770

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.90783 0.41027 16.837 < 0.0000000000000002 abuse1 0.89760 0.28890 3.107 0.00191 age -0.09954 0.01507 -6.607 0.000000000047 depression1 1.35707 0.27387 4.955 0.000000767370 NBsafety1 -0.88643 0.18767 -4.723 0.000002439185 abuse1:depression1 1.12343 0.57906 1.940 0.05247

Residual standard error: 4.716 on 2714 degrees of freedom (13 observations deleted due to missingness) Multiple R-squared: 0.06073, Adjusted R-squared: 0.059 F-statistic: 35.1 on 5 and 2714 DF, p-value: < 0.00000000000000022

> > reg_a <- emmeans(reg, ~abuse*depression) > reg_a abuse depression emmean SE df lower.CL upper.CL 0 0 3.95 0.111 2714 3.73 4.16 1 0 4.84 0.268 2714 4.32 5.37 0 1 5.30 0.251 2714 4.81 5.80 1 1 7.32 0.437 2714 6.47 8.18

Results are averaged over the levels of: NBsafety Confidence level used: 0.95 > > contrast(reg_a, "revpairwise", by="depression", adjust="none") depression = 0: contrast estimate SE df t.ratio p.value abuse1 - abuse0 0.898 0.289 2714 3.107 0.0019

depression = 1: contrast estimate SE df t.ratio p.value abuse1 - abuse0 2.021 0.504 2714 4.011 0.0001

Results are averaged over the levels of: NBsafety > > emmip(reg, depression~abuse, CIs=TRUE) > emmip(reg, abuse~depression, CIs=TRUE) > > reg_b <-emmip(reg, depression~abuse, CIs=TRUE, plotit=FALSE) > reg_b depression abuse yvar SE df LCL UCL tvar xvar 0 0 3.95 0.111 2714 3.73 4.16 0 0
1 0 5.30 0.251 2714 4.81 5.80 1 0
0 1 4.84 0.268 2714 4.32 5.37 0 1
1 1 7.32 0.437 2714 6.47 8.18 1 1

Results are averaged over the levels of: NBsafety Confidence level used: 0.95 > > ggplot(data=reg_b, aes(x=depression, y=yvar, fill=abuse)) + geom_bar(stat="identity", position="dodge")

enter image description here

enter image description here

If I assume that the interaction term was significant (although it was not, p=0.05247), can I interpret that the impact of abuse on delinquency is stronger for people with high levels of depression than those with low depression. This was because emmeans contrast showed the greater difference in depression=1 (abuse1 - abuse0 2.02) compared to the difference in depression=0 (abuse1 - abuse0 0.898), and the slope in the graph is steeper for depression=1 compared to depression=0.

Could you please guide me whether I am correct?

JoeJoe
  • 21
  • You don't strictly need emmeans for this interaction analysis because abuse and depression are both binary variables and they don't interact with the other two predictors. (On the other hand, the line plot is informative; the barplot not really.) Since the interaction is a difference of differences: (abuse1 - abuse0) | depression = 1 - (abuse1 - abuse0) | depression = 0 = 2.021 - 0.898 = 1.123. This is exactly how "steeper" the slope is for depressed people. – dipetkov Apr 23 '23 at 02:05
  • A further comment: I think phrases like "the strength of the impact of abuse on delinquency" vaguely points towards a causal explanation of the regression results. With observational data, this language is not helpful (is in fact misleading?) unless you also have a convincing causal DAG to present as well. – dipetkov Apr 23 '23 at 19:15

1 Answers1

1

Had the interaction coefficient been "statistically significant," your interpretation would be correct. You wouldn't even need to do all of the other calculations to come to that conclusion, as a positive abuse1:depression1 interaction coefficient means that the predicted values of the outcome are larger than you would have predicted based on the individual abuse1 and depression1 coefficients. The other calculations and plots, however, are certainly useful for demonstrating the practical implications of the model.

In frequentist statistical analysis, however, you can't claim that there is any moderation by depression status, as the interaction coefficient didn't pass your pre-specified significance threshold. That probably says more about the limitations of frequentist analysis than about the interaction.

You seem to have a very large data set and only a few predictors, so it's possible that a more flexible model might have been a better choice. For example, you only have a single linear term for age, while strictly linear associations are seldom correct. A regression spline for age might have been a better choice.

The binary depression predictor also could be limiting your power. If you have some continuous measure of depression that would be better and could also be modeled as a spline. See this page for the problems introduced by categorizing a potentially continuous predictor. See Frank Harrell's Regression Modeling Strategies for further advice on building and testing regression models; Chapter 2 discusses ways to build flexible models with continuous predictors.

EdM
  • 92,183
  • 10
  • 92
  • 267