0

Predictor variable has the correlation value of -0.98 to the binary response (with classes 1 and 2). However, logistic regression coefficient come out to be insignificant. The residual deviance of the model is also significantly low. What can be the reason behind this? How is it possible to have a model with significant coefficients? The code and output is below.

fit <- glm(blType==1 ~ gammaSupRatio, family = binomial(link = "logit"), data = df)
summary(fit)

Call: glm(formula = blType == 1 ~ gammaSupRatio, family = binomial(link = "logit"), data = df)

Deviance Residuals: Min 1Q Median 3Q Max
-0.15354 -0.15354 0.00002 0.00002 2.98214

Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.4348 0.5808 -7.636 2.24e-14 *** gammaSupRatio 18.0006 2008.1727 0.009 0.993


Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 709.712  on 511  degrees of freedom

Residual deviance: 32.644 on 510 degrees of freedom AIC: 36.644

Number of Fisher Scoring iterations: 21

Emre Kara
  • 1
  • 1
  • Is "-0.98" Pearson correlation? – Zhanxiong Feb 21 '23 at 16:26
  • @Zhanxiong Yes it is. – Emre Kara Feb 21 '23 at 16:27
  • You have perfect (or near-perfect) separation. Very high log-odds for the coefficient but near infinite standard errors, because your model fits (almost) perfectly. See this page and others about the Hauck-Donner phenomenon. – EdM Feb 21 '23 at 16:49
  • Have you tried using Firth logistic regression, i.e., logistif::flic(), instead of regular logistic regression? It is specifically designed to deal with cases with near perfect separation and is less biased than usual logistic regression. – Noah Feb 21 '23 at 16:54
  • There is almost perfect separation. Class 2 has high predictor variable level and class 1 has the lower level with 3 exceptions. The message of the data is clear. However, it should have been in the model and I need an explainable model. – Emre Kara Feb 21 '23 at 17:00
  • In that case, the first link by @EdM seems to be a duplicate that addresses how to deal with such a situation. – Dave Feb 21 '23 at 17:26
  • @Dave The issue is the perfect separation in the link. The question here is about the insignificant coefficient of a predictor variable that near-perfectly separates the response variable. Although problems root from the same cause this outcome wasn't present in the link. – Emre Kara Feb 21 '23 at 17:31
  • @Noah that method seems to deal well against separation issues. Also, it provides an interpretable model with statistical tests on independent variables. Thanks for the answer. – Emre Kara Feb 21 '23 at 17:33

0 Answers0