0

I am running a logistic regression model where the outcome variable is Neurologic Complications, and there are various factors who's impact I am examining. One of the factors (HTN_new1), a categorical variable, has a strangely high standard error value, and is throwing off subsequent confidence interval and OR calculations. What could be the cause of this inflated value and how can I go about fixing it? Note, I have already checked the original excel model & R dataframe for entry errors and could not find anything that stood out.

Input:

 NeuroLogit2 <- glm(`Neurologic Complication?` ~ stroke_comorbid + HTN_new + 
                      `anesthesia type`+`Over 75yo?` + Gender_new + Embol_Collateralart +
                      carotid.subclavian + `Spinal drain?`, data=Tevar.new, family=binomial)
> summary(NeuroLogit2) 

Output:

Call:
glm(formula = `Neurologic Complication?` ~ stroke_comorbid + 
    HTN_new + `anesthesia type` + `Over 75yo?` + Gender_new + 
    Embol_Collateralart + carotid.subclavian + `Spinal drain?`, 
    family = binomial, data = Tevar.new)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.09673  -0.37157  -0.27390  -0.00009   2.87970  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)  
(Intercept)           -20.90519 1153.29897  -0.018   0.9855  
stroke_comorbid1        1.40348    0.57747   2.430   0.0151 *
HTN_new1               16.59862 1153.29876   0.014   0.9885
`anesthesia type`1      1.49715    0.77617   1.929   0.0537 .
`Over 75yo?`1           0.17094    0.51136   0.334   0.7382  
Gender_new1             0.00523    0.54231   0.010   0.9923  
Embol_Collateralart1   -0.58778    1.14262  -0.514   0.6070  
carotid.subclavian1     0.28837    0.64745   0.445   0.6560  
`Spinal drain?`1        1.03701    0.53742   1.930   0.0537 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 144.76  on 324  degrees of freedom
Residual deviance: 118.84  on 316  degrees of freedom
AIC: 136.84

Number of Fisher Scoring iterations: 18
bdg67
  • 1
  • or it could be for one of your classes, there's only 1 observation. for example HTN_new – StupidWolf Jun 08 '20 at 20:16
  • 3
    This usually happens in complete or quasi-complete separation, and I'm willing to bet that is what is happening. Essentially, there is a hyper plane which perfectly separates the 1s from the 0s. Have you tried Firth's Logistic Regression? – Demetri Pananos Jun 08 '20 at 20:25
  • @DemetriPananos I have not, how would I go about that? Sorry I'm a bit new to all of this :/ – bdg67 Jun 08 '20 at 20:31
  • @bdg67 That's ok. Try fitting your data with this package. You might have to run 'install.package('logistif')'. – Demetri Pananos Jun 08 '20 at 20:33
  • @DemetriPananos will I still be able to find CIs, OR, AUC/C-statistic using this package? I also am not sure what the equivalent of the glm function is in this logistif package? – bdg67 Jun 08 '20 at 20:40
  • You can still find confidence intervals, odds ratios (albeit with a little bit of effort), and you can still calculate the AUC but will need to first get probabilities from your model. Here is how to do that. I'm not even sure this will work, so before going all in I would just run the fit and see if the penalization the Firth logistic regression uses helps quell the problem. – Demetri Pananos Jun 08 '20 at 20:49
  • @DemetriPananos Interestingly, now another factor seems to have wonky results lol – bdg67 Jun 08 '20 at 20:54
  • See https://stats.stackexchange.com/questions/254124/why-does-logistic-regression-become-unstable-when-classes-are-well-separated – kjetil b halvorsen Jun 08 '20 at 20:57
  • 1
    The highly-voted threads in https://stats.stackexchange.com/questions/tagged/separation?tab=Votes should help. – Sycorax Jun 08 '20 at 21:17

0 Answers0