0

I am applying a logistic regression on the effect of dose, age, PS, menopausal and pairID on the response variable. The data come from a case-control study where controls were matched/paired based on age, PS and menopausal. The pair ID is then recorded as PairID and is also a covariate in the model.

The model is:

    mod2 <- glm(response ~ dose + Age + PS + menopausal + PairID, 
                family = binomial(link = "logit"), data=dat)

The model output shows the estimated coefficient:

    summary(mod2)
Deviance Residuals: 
 Min        1Q    Median        3Q       Max  
-0.89847  -0.36858  -0.05885  -0.02615   3.00978  

Coefficients: (2 not defined because of singularities)
                 Estimate Std. Error z value Pr(&gt;|z|)


(Intercept)     -32.73273   11.42767  -2.864 0.004179 ** 
MaxDose           0.07984    0.02287   3.491 0.000482 ***
Age               0.62496    0.27268   2.292 0.021910 *  
PS1              -0.39054    0.89232  -0.438 0.661628    
PS2              17.09581 1268.45566   0.013 0.989247    
PS3              15.76341 1833.21516   0.009 0.993139    
menopausalpost  -12.55850    5.28287  -2.377 0.017444 *  
menopausalmale  -23.87464   10.51685  -2.270 0.023200 *  
PairID2           5.14332    3.02191   1.702 0.088754 .  
PairID3           3.60030    2.20299   1.634 0.102200    
PairID4         -13.42706    6.47253  -2.074 0.038036 *  
PairID5          -0.16041    1.50785  -0.106 0.915276    
PairID6         -13.20065 1268.45572  -0.010 0.991697    
PairID7         -19.62178 1833.21665  -0.011 0.991460    
PairID8          -7.08125    3.55520  -1.992 0.046393 *  
PairID9          -0.05779    1.84367  -0.031 0.974994    
PairID10         -6.59761    3.44665  -1.914 0.055593 .  
PairID11          3.13177    1.55304   2.017 0.043744 *  
PairID12          9.04193    4.16930   2.169 0.030106 *  
PairID13         15.00443    7.03528   2.133 0.032946 *  
PairID14          1.00987    1.58307   0.638 0.523527    
PairID15          5.45367    2.91475   1.871 0.061336 .  
PairID16         -3.13399    2.18068  -1.437 0.150672    
PairID17          6.85244    3.63025   1.888 0.059081 .  
PairID18          5.60300    3.02755   1.851 0.064217 .  
PairID19               NA         NA      NA       NA    
PairID20        -11.53556    5.23541  -2.203 0.027569 *  
PairID21        -11.09677    5.23554  -2.120 0.034048 *  
PairID22          6.15099    3.16279   1.945 0.051799 .  
PairID23               NA         NA      NA       NA    
PairID24        -18.62866 1268.45749  -0.015 0.988283    
PairID25          4.83989    1.81209   2.671 0.007565 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 226.13  on 540  degrees of freedom
Residual deviance: 167.62  on 511  degrees of freedom
AIC: 227.62

Number of Fisher Scoring iterations: 17

From this output, seems that the covariate menopausal has an significant effect. However, If I apply analysis of deviance to test the significance of each covariate, covariate menopausal shows df=0. Does anyone know why?

    drop1(mod2, test = "Chisq")
Single term deletions

Model:
collapsed ~ MaxDose + Age + PS + menopausal + PairID
           Df Deviance    AIC    LRT  Pr(&gt;Chi)    
&lt;none&gt;          167.62 227.62                     
MaxDose     1   213.08 271.08 45.462 1.557e-11 ***
Age         1   175.22 233.22  7.602  0.005829 ** 
PS          3   173.49 227.49  5.868  0.118227    
menopausal  0   167.62 227.62  0.000              
PairID     22   186.31 202.31 18.684  0.664773 

tiantianchen
  • 2,101
  • 1
    I don't know, but you have some very large coefficients there with enormous standard errors. Separation? – Scortchi - Reinstate Monica Nov 29 '13 at 21:15
  • I need to mention that the data come from a case-control study where controls were matched/paired based on age, PS and menopausal. The pair ID is then recorded as PairID and is also put in the model. Could it be such relation between PairID and the rest covariats that lead to the strange output? – tiantianchen Nov 29 '13 at 21:25
  • 1
    So the model has 540 degrees of freedom. So the case-control matching is about 1:10 oversampling controls and you have approximately 54-60 cases? Yet you've adjusted for both the "pairID" - really a cluster ID - as well as the factors used to generate the matching? This sounds like a job for conditional logistic regression, rather. The model fit you've provided is complete nonesense, there's total separation from overadjustment. The rule of 10-20 events per variable, means you're adjusting for an order of magnitude more variables than can be reliably estimated. – AdamO Oct 18 '22 at 20:20

0 Answers0