4

Lets say I want to run a logistic GLMM like so:

Binary Outcome 
~ Subject Composite Score 1 * Subject Composite Score 2 * Item-Level Variable 
+ Random Effect Item
+ Random Effect Subject

I was reading in Logistic Regression Models by Joseph Hilbe that interactions between multiple continuous variables is erroneous for standard logistic regressions (from Page 190):

enter image description here

Given this is the case, my question is: if the presence of an interaction is supposed to occur between two continuous variables in a logistic regression, is there no way of capturing this? Can one artificially create a binary/categorical variable instead (such as "high x" vs "low x") without issues? Is it better to just make an additive model with easily interpretable coefficients?

There is one additional thing I should mention. The binary outcome can also be transformed into a subject-level sum score. However, given it is a composite, I'm not sure what would be the best way to model this in GLMM, as I don't believe it would have a normal distribution.

Edit

For context, this is the page directly proceeding it. I believe this is the first part of the book that addresses this issue: enter image description here

  • 1
    Nah, the author just means "there are many ways variables can interact aside from the 'linear' interaction term x1*x2". Frankly I think the author is being a little dramatic about the possibility for nonlinear interactions, which is by no means constrained to the logistic regression problem. – John Madden Aug 22 '22 at 01:02
  • 1
    Interactions are routine. Is there more context to this? – Dave Aug 22 '22 at 01:05
  • I just edited the question to include the page directly preceding this one. – Shawn Hemelstrand Aug 22 '22 at 01:07
  • @JohnMadden so then the model I proposed at the beginning would be okay? – Shawn Hemelstrand Aug 22 '22 at 01:09
  • 1
    @ShawnHemelstrand Yes. – John Madden Aug 22 '22 at 01:10
  • In the previous book I was reading from him on this topic, he also mentioned with regards to a categorical x continuous interaction: "The interaction term is significant. It makes no difference if the main effects terms are significant or not. Only the interaction term is interpreted for this model." Is this also false? I'm not sure why R for example would even include the * operator if that was the case. – Shawn Hemelstrand Aug 22 '22 at 01:34

1 Answers1

5

I was reading in Logistic Regression Models by Joseph Hilbe that interactions between multiple continuous variables is erroneous for standard logistic regressions (from Page 190):

You are overinterpreting, he does not say it is erroneous, only that it is often difficult to interpret. In general, as shown below, interactions (and other complex models) are often difficult to interprete/communicate, so visual summaries might be preferable.

An alternative to linear x linear interactions is to use a gam (generalized additive model) with smooth interactions, for instance, as below, represented via tensor product splines. Some R code:

library(mgcv)
library(Fahrmeir)
data(credit)
library(gratia)

mod0 <- gam(Y ~ ti(Mes, k=4) + ti(DM, k=4) + ti(Mes, DM, k=4) + Cuenta + Ppag + Uso + Sexo + Estc, family=binomial, data=credit)

The fitted interaction can be shown graphically:

enter image description here

The code for the graphic is simply

gratia::draw(mod0)

The summary for the model is:

summary(mod0)

Family: binomial Link function: logit

Formula: Y ~ ti(Mes, k = 4) + ti(DM, k = 4) + ti(Mes, DM, k = 4) + Cuenta + Ppag + Uso + Sexo + Estc

Parametric coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.1208 0.2678 -0.451 0.651825
Cuentagood running -1.9211 0.2091 -9.186 < 2e-16 *** Cuentabad running -0.6360 0.1810 -3.513 0.000443 *** Ppagpre mal pagador 1.0594 0.2580 4.107 4.01e-05 *** Usoprofesional 0.4092 0.1656 2.471 0.013478 *
Sexohombre -0.1386 0.2275 -0.609 0.542379
Estcvive solo 0.4462 0.2243 1.990 0.046636 *


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms: edf Ref.df Chi.sq p-value
ti(Mes) 2.257 2.625 21.07 8.49e-05 *** ti(DM) 2.587 2.831 19.45 0.000141 *** ti(Mes,DM) 2.961 3.575 16.71 0.002464 **


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) = 0.228 Deviance explained = 20% UBRE = 0.0074576 Scale est. = 1 n = 1000

Clarification Additional questions in comments

  1. Can you run a similar model for GLMM?

In principle yes, but in practice it depends on implementations. But R's mgcv can fit some GLMM's, so see ifd the models you are interested in can be fit with mgcv.

  1. How does one interpret the summary/plot? I can't see the Y value in the plot so I'm confused how that works. I've never read much about GAMs so this is new to me.

The plot I have shown above is on the scale of the linear predictor. We can however make some other plots, applying the inverse link function to show results on the response (probability of default, in this example) scale. One package that is useful for this is visreg (you can search this site for "visreg" to find additional examples).

The plots I will show below are conditional plots, that means they are based on fixing the variables not shown at some constant values. I use the defaults, which are for

  • continuous variables: the median
  • factors: the most frequent (or modal) level
library(visreg)

visreg(mod0, scale="response") # Not shown

Interaction:

visreg2d(mod0, "Mes", "DM", data=credit) # Scale of linear predictor # Not shown

visreg2d(mod0, "Mes", "DM", data=credit, scale="response")

2d interaction plot of Mes and DM

Remember that this plot is with all the factor variables kept at their most frequent values. Specifically, for this plot it means loans to men, which do not live alone, and for private (not professional) use ... For a more complete presentation of results, make additional plots using additional arguments to specify the conditioning variables.

The plot is interesting, it shows that the loans with highest default risk is the ones with large amounts (DM, deutche marks) and short duration (Mes) and the ones with small amounts and large duration!

One problem with this plot is that it does not show uncertainty of the fit.

  • 1
    Two questions: 1) Can you run a similar model for GLMM? 2) How does one interpret the summary/plot? I can't see the Y value in the plot so I'm confused how that works. I've never read much about GAMs so this is new to me. – Shawn Hemelstrand Aug 22 '22 at 04:09
  • 1
    Thank you for clarifying. I will read up on GAMs, but Im not sure I will be able to use them for the current project. In any case I appreciate the new information. – Shawn Hemelstrand Aug 23 '22 at 01:00