I am trying to test whether there is a significant interaction between an ordinal (A) and categorical variable (B) in R using glm. When I create a model that only includes the interaction term A:B, the model runs fine and I get a reasonable estimate. When I run the "full" model X ~ A+B+A*B, I get an unreasonably high standard error. However, when I run each term on its own X ~ A or X ~ B, I also get reasonable estimates. I suspect it might have something to do with near-perfect fit for one combination of my ordinal and categorical variables but I'm not sure. Any ideas on what is going on? Is it bad form to just have a model with only an interaction term A:B and not the A+B+A*B?
model1 <- glm(X~A:B, family=binomial(logit))
model2 <- glm(X~A, family=binomial(logit))
model3 <- glm(X~B, family=binomial(logit))
model4 <- glm(X~A+B+A*B, family=binomial(logit))
summary(model1)
Estimate Std. Error z value Pr(>|z|)
(Int) 3.4320 1.1497 2.985 0.00283 **
A:B [no] -1.3857 0.6813 -2.034 0.04195 *
A:B [yes] -2.2847 0.8017 -2.850 0.00437 **
summary(model2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.9572 1.0792 2.740 0.00614 **
A -1.5221 0.6495 -2.343 0.01911 *
summary(model3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.2809 0.5055 2.534 0.0113 *
B[yes] -1.1268 0.6406 -1.759 0.0786 .
summary(model4)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 36.66 4125.28 0.009 0.993
A -18.10 2062.64 -0.009 0.993
B[yes] -34.24 4125.28 -0.008 0.993
A:B[yes] 16.46 2062.64 0.008 0.994
> dput(my.data)
structure(list(X = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("0", "1"),
class = "factor"),
A = structure(c(1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("1", "2"),
class = "factor"),
B = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L), .Label = c("no","yes"),
class = "factor")),
.Names = c("X", "A", "B"), row.names = c(NA, -49L), class = "data.frame")
Bwith a new variable likenew.B=as.factor(B). Also changeAwithnew.A=ordered(A)and re-fit. – Stat May 28 '14 at 17:48A*Bisn't an interaction term but a full model. The interaction term alone isA:B, andA*BmeansA+B+A:B(but this isn't why you are having a problem). (ii) The first thing you check when this happens is whether the interaction is nearly a linear combination of A and B. – Glen_b May 28 '14 at 23:04