Adding interactions to logistic regression leads to high SEs

Question

I am trying to test whether there is a significant interaction between an ordinal (A) and categorical variable (B) in R using glm. When I create a model that only includes the interaction term A:B, the model runs fine and I get a reasonable estimate. When I run the "full" model X ~ A+B+A*B, I get an unreasonably high standard error. However, when I run each term on its own X ~ A or X ~ B, I also get reasonable estimates. I suspect it might have something to do with near-perfect fit for one combination of my ordinal and categorical variables but I'm not sure. Any ideas on what is going on? Is it bad form to just have a model with only an interaction term A:B and not the A+B+A*B?

model1 <- glm(X~A:B,     family=binomial(logit))
model2 <- glm(X~A,       family=binomial(logit))
model3 <- glm(X~B,       family=binomial(logit))
model4 <- glm(X~A+B+A*B, family=binomial(logit))

summary(model1)
             Estimate Std. Error z value Pr(>|z|)   
(Int)          3.4320     1.1497   2.985  0.00283 **
A:B [no]      -1.3857     0.6813  -2.034  0.04195 * 
A:B [yes]     -2.2847     0.8017  -2.850  0.00437 **

summary(model2)
Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept)    2.9572     1.0792   2.740  0.00614 **
A             -1.5221     0.6495  -2.343  0.01911 *

summary(model3)
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)    1.2809     0.5055   2.534   0.0113 *
B[yes]        -1.1268     0.6406  -1.759   0.0786 .

summary(model4)
             Estimate Std. Error z value Pr(>|z|)
(Intercept)     36.66    4125.28   0.009    0.993
A              -18.10    2062.64  -0.009    0.993
B[yes]         -34.24    4125.28  -0.008    0.993
A:B[yes]        16.46    2062.64   0.008    0.994

> dput(my.data)
structure(list(X = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 
                               1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 
                               2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 
                               1L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("0", "1"), 
                               class = "factor"), 
               A = structure(c(1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 
                               2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
                               1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 
                               1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("1", "2"), 
                               class = "factor"), 
               B = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
                               2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
                               2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                               2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L), .Label = c("no","yes"),
                               class = "factor")), 
              .Names = c("X", "A", "B"), row.names = c(NA, -49L), class = "data.frame")

Have you stored A and B as factors? It doesn't look like A is being treated as categorical, unless it's dichotomous (just like B)? Regardless, this is perplexing; any data you can share to make your example reproducible? Anyway, yes, model1 is bad form because main effects ought to be included. P.S. model4 would do this automatically if coded as X~A*B. — Nick Stauner, May 28 '14 at 16:24
I added the data at the bottom of the original post. I recoded A to be a factor, and I still have a similar problem. — confuser, May 28 '14 at 16:50
Replace your B with a new variable like new.B=as.factor(B). Also change A with new.A=ordered(A) and re-fit. — Stat, May 28 '14 at 17:48
@Stat: Having seen the data, I don't think it will matter. Everything is dichotomous anyway. I'm guessing this is due to the model achieving probabilities of zero or one for certain combinations of the predictors, but I haven't had time to check this guess yet. — Nick Stauner, May 28 '14 at 17:50
@NickStauner I think that is the problem but I am not sure what to do in this situation since it has never come up. I explored nonparametric bootstrapping to get error estimates, but I'm not sure if that is appropriate here or not, especially given the small sample size. — confuser, May 28 '14 at 18:10
I think I found the answer(s) to the problem here: http://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression I have been searching for this answer for months! I chose to use the BayesGLM in the arm package. It seems to give reasonable estimates. — confuser, May 28 '14 at 19:28
(i) Note that when using factors in R, A*B isn't an interaction term but a full model. The interaction term alone is A:B, and A*B means A+B+A:B (but this isn't why you are having a problem). (ii) The first thing you check when this happens is whether the interaction is nearly a linear combination of A and B. — Glen_b, May 28 '14 at 23:04

score 6 · Answer 1 · edited Apr 13 '17 at 12:44

As @NickStauner and you have surmised, this is due to separation.

It is always worth looking at your data! When your data are binary, this is less obvious, but you can see a lot with table(). For example, another problem that causes SEs to expand is multicollinearity (which we think of with continuous variables, but can happen with binary covariates as well). Here's a quick check to see if A is collinear with B:

summary(my.data)
# X      A        B     
# 0:17   1:26   no :23  
# 1:32   2:23   yes:26
with(my.data, table(A, B))
#    B
# A   no yes
#   1 10  16
#   2 13  10

So, we don't see anything suspicious there. Now we can check for separation:

with(my.data, table(A, X, B))
# , , B = no
# 
#    X
# A    0  1
#   1  0 10
#   2  5  8
# 
# , , B = yes
# 
#    X
# A    0  1
#   1  5 11
#   2  7  3

The culprit is that there are no instances of X = 0 when A = 1 and B = "no". To check, we can add such an observation and re-run the analysis:

my.data.a = rbind(my.data, c(0, 1, "no"))
tail(my.data.a)
#    X A   B
# 45 1 1 yes
# 46 0 1 yes
# 47 1 2  no
# 48 1 1  no
# 49 1 1 yes
# 50 0 1  no

The fake observation shows up in the 50th row. Lets run the analysis and compare the output:

model4a <- glm(X~A+B+A*B, family=binomial(logit), data=my.data.a)

summary(model4)
# ...
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)
# (Intercept)    18.57    2062.64   0.009    0.993
# A2            -18.10    2062.64  -0.009    0.993
# Byes          -17.78    2062.64  -0.009    0.993
# A2:Byes        16.46    2062.64   0.008    0.994
# ...
# 
# Number of Fisher Scoring iterations: 17

summary(model4a)
# ...
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)  
# (Intercept)   2.3026     1.0486   2.196   0.0281 *
# A2           -1.8326     1.1935  -1.535   0.1247  
# Byes         -1.5141     1.1792  -1.284   0.1991  
# A2:Byes       0.1968     1.4804   0.133   0.8942  
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# ...
# 
# Number of Fisher Scoring iterations: 4

With the fake observation added in, there is no separation in that combination of factor levels, and the SEs look normal.

Another indication that separation was to blame is that the Number of Fisher Scoring iterations was very high (17), whereas 4 is more typical of the Newton-Raphson search algorithm. It just kept going further and further out looking for the minimum deviance. Because of the separation, there is no minimum, but eventually the rate of decrease drops below some threshold and the algorithm stops. In that region, the deviance is very flat, so you get very large SEs.

Remember that adding a fake observation is not a valid analysis, so throw model4a away! There is an excellent answer discussing how to deal with separation here: How to deal with perfect separation in logistic regression?

Thanks! This is almost the exact same approach I took at the beginning...I added an extra observation to see if it would run, and it did. Prior to this analysis, I had no idea that perfect separation existed. — confuser, May 29 '14 at 10:32

score 3 · Answer 2 · edited Apr 13 '17 at 12:44

Sure enough, you have "perfect" prediction with the interaction term; subset(my.data,A==1&B=='no') yields all 1s for X. The Bayesian alternative you've already chosen is one way to go in handling this. As Avitus and Scortchi have suggested, Firth's ^{₍₁₉₉₃₎} method of penalizing the model to reduce bias is another. Here's how that performs by default (had to convert the data back to numeric to get it to run):

require(logistf);summary(logistf(X~A*B,lapply(my.data,as.numeric)))

Model fitted by Penalized ML; Confidence intervals and p-values by Profile Likelihood 

                  coef  se(coef) lower 0.95 upper 0.95     Chisq          p
(Intercept)  8.7500000 13.253712  -14.45312  31.640625 10.176059 0.00142276
A           -2.5489062  6.982541  -14.16579  13.233453  6.302140 0.01205923
B           -2.3105651  6.984261  -13.93029  14.597185  4.436414 0.03518007
A:B          0.7167941  3.728474   -7.71979   6.651535  1.552630 0.21274756

Likelihood ratio test=18.18618 on 3 df, p=0.0004026211, n=49
Wald test = 1.084365 on 3 df, p = 0.7808497

Scortchi's answer suggests the hlr package offers yet another option (among others I won't review here), but I haven't been able to make it work for these data...

^{Reference

Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38. Retrieved from http://www.stat.duke.edu/~scs/Courses/Stat376/Papers/GibbsFieldEst/BiasReductionMLE.pdf.}

Adding interactions to logistic regression leads to high SEs

2 Answers2

Linked