Interpreting interaction between a categorical and centered continuous variable (binary response)

Question

In my model, in which I'm attempting to infer which covariates affect whether a fish has an empty stomach or not (1=empty, 0=not empty), I decided to grand-mean center the variable "SL" (standard length) so that the intercept would make more sense (instead of when SL=0). However, I'm not sure how to interpret the interaction in the summary output when one of the covariates is centered. My categorical variable is "fZone" (factor Zone, my location variable).

center_sl = grand-mean centered standard length of each fish caught
fZone = location of catch (3 levels)
> table(c_neb5$fZone)
Rankin    West Whipray 
    201     436      42
c_neb5$center_sl <- scale(c_neb5$SL, scale=FALSE)
mod2 <- bam(empty ~
             center_sl +
             fZone +
             center_sl:fZone + 
             ...,
         data = c_neb5, 
         method = 'fREML', 
         discrete = TRUE, 
         family = binomial(link = "logit"), 
         select = FALSE)

EDIT: Full model summary

> summary(mod2)
Family: binomial 
Link function: logit
Formula:
empty ~ center_sl + fZone + center_sl:fZone + s(sal) + s(temp) + 
    s(ToD) + s(fStation, bs = "re") + s(fCYR, bs = "re") + s(fStation, 
    fCYR, bs = "re") + s(fStation, CYR.std, bs = "re")
Parametric coefficients:
                        Estimate Std. Error z value Pr(>|z|)

(Intercept)            -1.298719   0.291203  -4.460  8.2e-06 ***
center_sl              -0.038851   0.011985  -3.242  0.00119 ** 
fZoneWest               0.122594   0.311480   0.394  0.69389

fZoneWhipray           -0.327579   0.639371  -0.512  0.60841

center_sl:fZoneWest    -0.002926   0.014650  -0.200  0.84169

center_sl:fZoneWhipray  0.061163   0.025891   2.362  0.01816 *

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
                          edf  Ref.df Chi.sq  p-value

s(sal)              1.783e+00   2.231  1.590 0.558432

s(temp)             1.128e+00   1.236  2.972 0.134198

s(ToD)              2.112e+00   2.637 16.235 0.000755 ***
s(fStation)         1.096e-04  82.000  0.000 0.619807

s(fCYR)             4.740e+00  12.000 14.165 0.009002 ** 
s(fCYR,fStation)    9.693e+00 237.000 11.111 0.205201

s(CYR.std,fStation) 1.258e+01  80.000 23.798 0.008646 **

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) =   0.16   Deviance explained = 18.1%
fREML =  990.1  Scale est. = 1         n = 679

My interpretation is that...(Intercept)=-1.298719 means the average size fish has a exp(-1.298719)= 0.272 odds of an empty stomach; fZoneWest=0.122594 the odds of empty stomach in West compared to my ref. level (Rankin) increase by exp(0.122594)=1.130425; and center_sl:fZoneWest=-0.002926 means for every 1 unit above average in size, the odds of an empty stomach decrease by exp(-0.002926)=0.9970783, compared to my ref. level. Am I on the right track? Any advice or corrections are greatly appreciated! The data is 679 rows in size, so the best I could do was post a subset of it down below.

Subset of my data:
example_data <- c_neb5[sample(nrow(c_neb5), 10), ]
> dput(example_data)
structure(list(CYR_Keyfield = c("C-2018-10-6-255", "C-2017-6-26-278", 
"C-2018-9-16-291", "C-2017-10-9-265", "C-2010-11-10-167", "C-2019-10-30-169", 
"C-2018-10-6-279", "C-2022-7-10-241", "C-2017-9-4-70", "C-2022-6-23-241"
), Species = c("Cynoscion nebulosus", "Cynoscion nebulosus", 
"Cynoscion nebulosus", "Cynoscion nebulosus", "Cynoscion nebulosus", 
"Cynoscion nebulosus", "Cynoscion nebulosus", "Cynoscion nebulosus", 
"Cynoscion nebulosus", "Cynoscion nebulosus"), ID = c("201810255_86", 
"20176278_52", "20189291_39", "201710265_100", "201011167_61", 
"201910169_54", "201810279_75", "20227241_46", "2017970_91", 
"20226241_34"), SL = c(33.58, 20.12, 50.25, 23.18, 68.72, 14.85, 
73.49, 61.84, 13.26, 25.79), empty = c(0, 0, 0, 0, 0, 1, 0, 0, 
1, 0), DateTime = structure(c(1538842500, 1498499220, 1537107120, 
1507558920, 1289399700, 1572449400, 1538837160, 1657460040, 1504530660, 
1656001620), class = c("POSIXct", "POSIXt"), tzone = ""), CYR = c(2018L, 
2017L, 2018L, 2017L, 2010L, 2019L, 2018L, 2022L, 2017L, 2022L
), Month = c(10L, 6L, 9L, 10L, 11L, 10L, 10L, 7L, 9L, 6L), DoY = c(279, 
177, 259, 282, 314, 303, 279, 191, 247, 174), ToD = c(12.25, 
13.7833333333333, 10.2, 10.3666666666667, 9.58333333333333, 11.5, 
10.7666666666667, 9.56666666666667, 9.18333333333333, 12.45), 
    JDay = c(5129, 4662, 5109, 4767, 2242, 5518, 5129, 6502, 
    4732, 6485), Zone = c("Rankin", "Rankin", "Whipray", "Rankin", 
    "West", "West", "Rankin", "Rankin", "West", "Rankin"), Station = c(255, 
    278, 291, 265, 167, 169, 279, 241, 70, 241), Standard_collection_station = c(0, 
    0, 0, 0, 0, 0, 0, 1, 1, 1), Latitude = c(25.085, 25.145, 
    25.118, 25.135, 25.106, 25.081, 25.133, 25.0750000309199, 
    25.132, 25.0750000309199), Longitude = c(-80.802, -80.809, 
    -80.76, -80.823, -80.917, -80.893, -80.797, -80.8159999921917, 
    -80.941, -80.8159999921917), sal = c(38.27, 41.01, 33.61, 
    26.75, 32, 36.18, 36.42, 40.08, 38.1, 39.07), temp = c(27.856, 
    32.2, 31.791, 29.512, 19.3, 28.398, 27.6679999999999, 30.243, 
    29.71, 29.262), fCYR = structure(c(9L, 8L, 9L, 8L, 2L, 10L, 
    9L, 12L, 8L, 12L), levels = c("2009", "2010", "2011", "2012", 
    "2013", "2015", "2016", "2017", "2018", "2019", "2021", "2022"
    ), class = "factor"), fMonth = structure(c(8L, 4L, 7L, 8L, 
    9L, 8L, 8L, 5L, 7L, 4L), levels = c("1", "3", "5", "6", "7", 
    "8", "9", "10", "11", "12"), class = "factor"), fStation = structure(c(60L, 
    69L, 76L, 63L, 40L, 42L, 70L, 57L, 11L, 57L), levels = c("20", 
    "21", "22", "23", "24", "40", "54", "65", "67", "68", "70", 
    "71", "73", "101", "105", "106", "107", "111", "112", "117", 
    "118", "119", "122", "123", "124", "130", "133", "134", "135", 
    "137", "143", "144", "145", "146", "147", "156", "157", "158", 
    "159", "167", "168", "169", "171", "172", "173", "174", "175", 
    "176", "224", "225", "226", "227", "229", "237", "239", "240", 
    "241", "253", "254", "255", "256", "257", "265", "266", "267", 
    "268", "269", "270", "278", "279", "280", "281", "282", "284", 
    "290", "291", "292", "294", "301", "302", "312", "609"), class = "factor"), 
    fZone = structure(c(1L, 1L, 3L, 1L, 2L, 2L, 1L, 1L, 2L, 1L
    ), levels = c("Rankin", "West", "Whipray"), class = "factor"), 
    CYR.std = c(9L, 8L, 9L, 8L, 1L, 10L, 9L, 13L, 8L, 13L), center_sl = structure(c(-6.70160530191458, 
    -20.1616053019146, 9.96839469808542, -17.1016053019146, 28.4383946980854, 
    -25.4316053019146, 33.2083946980854, 21.5583946980854, -27.0216053019146, 
    -14.4916053019146), dim = c(10L, 1L)), center_sal = structure(c(1.43373534609722, 
    4.17373534609722, -3.22626465390278, -10.0862646539028, -4.83626465390278, 
    -0.656264653902781, -0.416264653902779, 3.24373534609722, 
    1.26373534609722, 2.23373534609722), dim = c(10L, 1L)), center_temp = structure(c(-1.51357879234165, 
    2.83042120765835, 2.42142120765835, 0.142421207658348, -10.0695787923417, 
    -0.971578792341653, -1.70157879234175, 0.873421207658346, 
    0.340421207658348, -0.107578792341652), dim = c(10L, 1L))), row.names = c(495L, 
364L, 303L, 652L, 404L, 375L, 469L, 676L, 508L, 675L), class = "data.frame")

Look at this question among others on this site for how to interpret coefficients in models with interactions. With an interaction in default R coding, individual coefficients are for the case when the interacting predictor is at its reference level or at 0, as presented to the model. Interaction coefficients are for the extra association with outcome, beyond what's predicted from the individual coefficients, when both predictors are away from reference/0. — EdM, May 23 '23 at 20:47
I'm confused...why are you using a generalized additive model (GAM) with terms that are all linear? Wouldn't this be better suited to functions like lmer or glmer which are suited for linear models? — Shawn Hemelstrand, May 23 '23 at 22:00
@ShawnHemelstrand, the formula after the summary() function shows the whole model. — Nate, May 24 '23 at 02:24
I can see that, but where are all your non-parametric coefficients that are supposed to accompany your splines? — Shawn Hemelstrand, May 24 '23 at 02:59
@EdM, would a centered covariate merely adjust your description by saying something akin to that extra association with the outcome is now associated with changes above and below the mean of the covariate, rather than every 1 unit away from 0 (of the covariate)? — Nate, May 24 '23 at 13:10
@ShawnHemelstrand; full model summary now included. I thought I'd save space by just including the part my question related to. — Nate, May 24 '23 at 14:18

score 1 · Accepted Answer · answered May 24 '23 at 16:27

(Intercept)=-1.298719 means the average size fish has a exp(-1.298719)= 0.272 odds of an empty stomach

Only when fZone=Rankin, the reference level. Interpretation holds because you have presented "standard length" to the model as centered to 0 at the average size fish.

fZoneWest=0.122594 the odds of empty stomach in West compared to my ref. level (Rankin) increase by exp(0.122594)=1.130425

Only for the average fish size, at center_s1=0. I find it confusing, particularly in more complicated models, to work in the odds scale. I prefer to work in the coefficient scale, where coefficients add, and only exponentiate to odds (or convert to probability) at the end of the calculations.

center_sl:fZoneWest=-0.002926 means for every 1 unit above average in size, the odds of an empty stomach decrease by exp(-0.002926)=0.9970783, compared to my ref. level

That's incorrect. The interaction is the extra change beyond what you would predict based on the reference-level coefficients. You have, at the fZone reference level Rankin, a center_sl coefficient of -0.038851 for the log-odds difference per unit of center_sl. The center_sl:fZoneWest=-0.002926 is what you add to that coefficient when fZone=West. exp(-0.038851-0.002926)=0.959, for the change in odds per unit change in center_sl whenfZone=West.

In this type of model, centering a continuous predictor doesn't affect its own coefficients, only the coefficients of predictors with which it interacts. In your model if you didn't center s1, the sl coefficient would still be-0.038851 for the slope at the fZone reference level Rankin, but the Intercept and the coefficients for other levels would be those corresponding to an (extrapolated) s1 value of 0.

Shawn Hemelstrand has a helpful recent answer that might help you think this through. In that answer, think of $X$ as your original continuous s1 and the $Z$ and $W$ covariates as the two dummy variables that represent your fZone. Then work through what happens to the interpretation of regression coefficients when you replace $X$ with $X-\bar X$ in those equations (equivalent to your converting s1 to center_s1).

Interpreting interaction between a categorical and centered continuous variable (binary response)

1 Answers1