I am doing a logistic regression, with few adjustment variables
When i add the interaction term between a numeric variable (age) and a binary one, the estimates and the OR are high.
I know that adding an interaction between two categorical variables, this can happen if one of the categories is not represented for example. But I don't get how this can happen with a numeric variable, what should i check to see what can cause the high estimates by adding this interaction ?
When I say high OR, in my example it goes to : OR (CI) = 2353592.821 [6.017; 2.16e+12]
I've seen this post, but it is based on an interaction between two categorical variables:
Adding interactions to logistic regression leads to high SEs
As suggested in the first answer, it could be due to a quasi-complete separation of the data.
# data used
d = structure(list(y = c(0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0,
1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0),
x1 = structure(c(75, 76, 77, 78, 79, 81, 82, 83, 81, 82, 80, 81, 98, 80, 83, 80, 84, 86,
82, 88, 86, 87, 87, 90, 90, 91, 85, 89, 89, 94, 91, 95, 85, 96,
88, 94, 91, 76, 86, 88, 88, 85, 83, 87, 85, 75, 79, 88, 88, 92,
77, 89, 86, 87, 87, 80, 88, 89, 81, 82, 82, 82, 82, 86, 94, 88,
77, 84, 83, 96, 83, 86, 94, 90, 79, 89, 80, 95, 79, 84, 88, 82,
92, 76, 89, 83, 83, 82, 87, 94, 83, 87, 75, 79, 78, 93, 81, 96,
87, 92, 76, 95, 82, 77, 85, 88, 76, 88, 77, 100, 84, 98, 86,
78, 95, 84, 82, 81, 86, 86, 84, 85, 82, 88, 87, 92, 82, 88, 95,
87, 85)),
group = c("A", "A", "B",
"B", "A", "B", "A", "A", "B", "A", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "A", "A", "B", "A", "B", "B",
"A", "A", "A", "A", "A", "B", "B", "A", "A", "A", "B", "A", "A",
"A", "A", "A", "A", "A", "A", "B", "B", "B", "A", "B", "A", "A",
"B", "B", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B",
"A", "A", "B", "B", "B", "A", "B", "B", "A", "B", "B", "B", "B",
"B", "A", "B", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "B", "A", "A", "B", "A", "A", "B", "B",
"A", "A", "A", "B", "B", "B", "A", "B", "A", "A", "B", "B", "B",
"B", "A", "B", "A", "A", "B", "B", "A", "B", "B", "A")),
row.names = c(NA, -131L),
class = c("tbl_df", "tbl", "data.frame"))
plots
plot(y ~ x1, subset = group == "A", data=d)
plot(y ~ x1, subset = group == "B", data=d)
logistic regression
individual work fine
uni1 <- glm(y ~ x1, d, family=binomial)
uni2 <- glm(y ~ group, d, family=binomial)
round(exp(cbind(OR=coef(uni1), confint(uni1))), 3)
round(exp(cbind(OR=coef(uni2), confint(uni2))), 3)
adding the interaction
m2 <- glm(y ~ x1*group, d, family=binomial)
summary(m2)
round(exp(cbind(OR=coef(m2), confint(m2))), 3)
results of M2
OR 2.5 % 97.5 %
(Intercept) 0.000 0.000 7.450000e-01
x1 1.097 0.986 1.231000e+00
groupB 199823.765 1.148 7.098758e+10
x1:groupB 0.885 0.764 1.017000e+00
As you can see if you run this code, the OR for the m2 with interaction is incredibly huge, but on the plots we can see no separation.. That is why i am lost to explain it

