I received abnormal results when using glmrob (R function from robustbase) when assessing the association of a binomial independent variable X0 with a binomial dependent variable Y using logistic regression. The log odds ratio of X0 jumped from 16 (exp of 2.74) to a whopping 200 (exp of 5.3) after adjusting for X1, X2, and X3 covariables.
X1 is a binomial categorical variable where it has one observation for one of the two categories (see row #2) glmrob considers this observation as an outlier and downweighs it. Specifically, among Y, 1/32 (3%) observations from group 0 had been exposed by X0, while 9/27 (33%) of group 1 had been exposed.
Contingency table
| X:0 | X:1 | |
|---|---|---|
| Y:0 | 31 | 1 |
| Y:1 | 19 | 9 |
Is it appropriate to use glmrob for such skewed proportional data? The adjusted odds ratio seems super high especially compared to the unadjusted. Is there a more appropriate parameter for such a data? What do you suggest? Would be great to have references to justify the change.
Below are reproducible codes
Best regards, Ali
library(dplyr)
data <- data.frame(
Y=c(1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1),
X0=c(0,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0),
X1=c('B','B','C','B','A','B','B','B','B','B','A','B','B','B','B','B','A','B','A','B','A','A','C','B','B','B','B','B','C','B','B','B','B','A','B','B','B','B','B','B','B','B','B','A','B','B','A','B','C','B','B','B','B','A','B','B','A','B','A'),
X2=c(1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,1,0,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1,1,1,1,0,1,1,1),
X3 = c(18.9,5.4,18.4,18.6,15.3,18.2,17.3,17,12.6,6.9,17.6,15.4,16.1,8.4,11.6,16.2,10.7,14.9,19.1,21,16.5,13.2,19.3,16.7,18.3,18.5,13.8,17.7,17.7,16.8,16.9,16.2,14.6,15.8,13.4,16.2,15,21.2,20.3,20.7,13.8,17.5,18.8,10.2,8.2,9.8,14.9,16.6,19.6,16.3,15.9,16.2,15,15,15.7,19.9,18.1,18.2,19)
)
> data <- data[order(data$Y, decreasing = FALSE), ]
> row.names(data) <- NULL
> data
Y X0 X1 X2 X3
1 0 0 A 0 17.6
2 0 1 B 1 15.4
3 0 0 B 1 16.1
4 0 0 B 1 8.4
5 0 0 B 1 11.6
6 0 0 B 0 16.2
7 0 0 A 1 10.7
8 0 0 B 0 14.9
9 0 0 B 0 14.6
10 0 0 A 1 15.8
11 0 0 B 1 13.4
12 0 0 B 0 16.2
13 0 0 B 1 15.0
14 0 0 B 1 21.2
15 0 0 B 1 20.3
16 0 0 B 0 20.7
17 0 0 B 0 13.8
18 0 0 B 0 17.5
19 0 0 B 1 18.8
20 0 0 A 0 10.2
21 0 0 B 0 8.2
22 0 0 B 0 9.8
23 0 0 A 1 14.9
24 0 0 B 1 16.6
25 0 0 C 0 19.6
26 0 0 B 1 16.3
27 0 0 B 1 15.9
28 0 0 B 1 16.2
29 0 0 B 1 15.0
30 0 0 A 1 15.0
31 0 0 B 1 15.7
32 0 0 B 0 19.9
33 1 0 B 1 18.9
34 1 1 B 0 5.4
35 1 1 C 1 18.4
36 1 1 B 1 18.6
37 1 0 A 1 15.3
38 1 0 B 1 18.2
39 1 1 B 0 17.3
40 1 0 B 1 17.0
41 1 0 B 1 12.6
42 1 0 B 1 6.9
43 1 0 A 1 19.1
44 1 0 B 1 21.0
45 1 0 A 1 16.5
46 1 0 A 1 13.2
47 1 0 C 0 19.3
48 1 1 B 0 16.7
49 1 0 B 1 18.3
50 1 1 B 1 18.5
51 1 0 B 0 13.8
52 1 0 B 1 17.7
53 1 0 C 0 17.7
54 1 1 B 1 16.8
55 1 0 B 1 16.9
56 1 0 B 1 16.2
57 1 1 A 1 18.1
58 1 1 B 1 18.2
59 1 0 A 1 19.0
> robustbase::glmrob(Y ~ X0 , data=data, family = binomial(link = "logit")) %>% summary()
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.5436 0.2963 -1.834 0.0666 .
X0 2.7408 1.1396 2.405 0.0162 *
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robustness weights w.r w.x:
58 weights are ~= 1. The remaining one are
2
0.4483
> robustbase::glmrob(Y ~ X0 + X1 + X2 , data=data, family = binomial(link = "logit")) %>% summary()
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.45216 2.06462 -1.672 0.0945 .
X0 5.29736 2.54417 2.082 0.0373 *
X1B -0.58513 0.77066 -0.759 0.4477
X1C 2.30901 1.77568 1.300 0.1935
X2 2.07431 1.12265 1.848 0.0646 .
X3 0.09954 0.10882 0.915 0.3603
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robustness weights w.r w.x:
54 weights are ~= 1. The remaining 5 ones are
2 25 41 42 51
0.1180 0.8981 0.9436 0.7106 0.3551
#Using regular glm doesn’t cause this issue
> glm(Y ~ X0 + X1 + X2 , data=data, family = binomial(link = "logit")) %>% summary()
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.5188 0.9734 -1.560 0.11869
X0 3.2219 1.2200 2.641 0.00827 **
X1B -0.4763 0.7288 -0.654 0.51343
X1C 2.2179 1.5590 1.423 0.15484
X2 1.5971 0.8384 1.905 0.05679 .
Because in my study I have been using glmrob for all my regression analysis for the sake of being consistent, if I want to use standard glm instead for one model, I must provide a justification for the one-time change.