3

I have a logistic regression, and I am interested in the interaction between two categorical variables: one (let's call it A) is a continuous variable categorized in 20 quantiles, the other (B) is a categorical variable that can take 3 values. I use ventiles because the effect of A on C is highly non-linear and it is a simple way to account for it.

I ran the model and used predict.glm to predict the outcome/dependent variable for several values of the covariates/independant variables. I used both type = "link" and type = "response".

reg <- glm(C ~ A * B, family = "binomial")

library(marginaleffects) dataPlotLink <- predict(reg, newdata = datagrid(model = reg, A = levels(A), B = levels(B)), response = "link")

dataPlotResponse <- predict(reg, newdata = datagrid(model = reg, A = levels(A), B = levels(B)), response = "response")

Results suggest that the difference in predicted probability between groups of B is getting large as A increases. But this is mainly due to the transformation from logit to predicted probability since the model shows no interaction effects between A and B (see graphs below).

Then, my question is: how do I interpret those results? Can I say that, in light of the second plot, as A increases, the effect of A on C is higher for group 1 than for group 3? or, in other words, that the relationship between A and C is different among levels of C (on the pred. prob. scale)?

enter image description here

Maël
  • 269
  • 3
    Please edit the question to show how you modeled the interaction between A and B. Also, categorizing the continuous A variable into 20 groups is probably costing you a lot of unnecessary degrees of freedom and hurting your model. A smooth continuous model of A, for example with regression splines, would be highly preferable. See this page, for example. – EdM Sep 12 '22 at 12:18
  • I added additional information regarding the model. I could indeed use regression splines, but they also have their limitations (e.g. higher SE at the extremas), from what I know. Do you think that would solve the interpretation problem? – Maël Sep 12 '22 at 12:30
  • 1
    A proper test of the interaction would combine information from all of the interaction terms, perhaps best via a likelihood-ratio test (anova()) between the model with the interaction (~A*B) and without (A+B). The very large number of interaction coefficients will make it unlikely for that to reach "statistical significance," but please edit the question to show the result of that test. A regression spline would involve many fewer individual and interaction coefficients and provide more power to detect a true interaction (if it exists). – EdM Sep 12 '22 at 13:06
  • "More impacted" is a subjective judgment, if you can formalize it, there won't be ambiguity. Both plots show the same result, just in one case it's transformed, but the transformation preserves the ordering. – Tim Sep 12 '22 at 13:13
  • @Tim I tried to make it better. they indeed show the same result, but then since the transformation into pred prob distorts the linearity of the log scale, we end up with non-parallel trends on the second plot. So how do we deal with those non-parallel trends? What do they mean in terms of how the relationship of A on C change across levels of B? – Maël Sep 12 '22 at 13:24
  • @Maël it depends if you ask about the difference in log-odds (link) or probabilities. That is why I said this is a question you need to ask yourself. – Tim Sep 12 '22 at 13:26
  • 2
    Hi Mael, what you observe is a common (and widely discussed) problem in any GLM with a non-linear link function - for the logistic regression, see e.g. here https://stats.oarc.ucla.edu/stata/seminars/deciphering-interactions-in-logistic-regression/ – Florian Hartig Sep 12 '22 at 13:37
  • @Florian Hartig Very good reference. I was totally unaware of that issue. Logistic interactions are a complex concept indeed. Tim, knowing that now, I understand your comment. – Maël Sep 12 '22 at 13:46

0 Answers0