"Incorrect" p-value during probit regression when probability of a response is zero

Question

I am running a probit regression with glm. I got a non-significant value (p~0.98) for an interaction that other statistical methods suggested should be significant. Looking into it, I think it's because 1 of the cells had zero positive ("YES") outcomes. When I artificially change a data point so that there's one positive outcome, I get a significant result of p < 0.02, even though this actually decreases the difference in the effect of the cue ("Cue") between the two conditions ("Condition").

The code in R is glm(Response~Cue*Condition, family=binomial(link=probit), data=data)

The data look like this:

Condition = 0
        Response:    0   1  
  Cue: 0            49   3  
       1            25  25  

Condition = 1  
        Response:    0   1  
  Cue: 0            48   3       
       1            47   0

The counts aren't quite exact but it shows the pattern, which is that the subject responds "appropriately" 50% of the time to Cue 1 in Condition 0 but does not do so in Condition 1 (and in fact, responds slightly more often in this condition to the incorrect cue, although this isn't significant). Changing the last row of counts to 46 1 gives the significant interaction I was expecting.

My intuitions are hazy, but I imagine this has something to do with the probit function being undefined for probabilities of 0 or 1. Does this seem correct? If so, why does it return a non-significant value rather than a warning? More to the point, is there a way to deal with this? I can imagine building in a correction that gives a non-zero estimate for the probability based on the number of trials, but is there a standard, implemented solution?

score 5 · Accepted Answer · answered Apr 17 '12 at 13:14

As Greg Snow suggested, you have quasi-complete separation: the Condition=1, Cue=1 perfectly predicts the outcome of 0. In such cases the corresponding parameter estimate is infinite, and the Wald test (which you are looking at) does not really work:

> dd <- data.frame(Condition=c(0,0,0,0,1,1,1,1),
+                  Cue=c(0,0,1,1,0,0,1,1), 
+                  Response=c(0,1,0,1,0,1,0,1),
+                  Count=c(49,3,25,25,48,3,47,0))
> 
> 
> m1 <- glm(Response ~ Cue*Condition, weight=Count, data=dd, family=binomial(link="probit"))
> summary(m1)

Call:
glm(formula = Response ~ Cue * Condition, family = binomial(link = "probit"), 
    data = dd, weights = Count)

Deviance Residuals: 
      1        2        3        4        5        6        7        8  
-2.4132   4.1371  -5.8871   5.8871  -2.4125   4.1230  -0.0007   0.0000  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -1.574445   0.279914  -5.625 1.86e-08 ***
Cue             1.574445   0.331312   4.752 2.01e-06 ***
Condition       0.009718   0.396566   0.025    0.980    
Cue:Condition  -5.716964 203.238776  -0.028    0.978    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 172.51  on 6  degrees of freedom
Residual deviance: 115.07  on 3  degrees of freedom
AIC: 123.07

Number of Fisher Scoring iterations: 14

However likelihood / deviance based tests do work:

> anova(m1, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: probit

Response: Response

Terms added sequentially (first to last)


              Df Deviance Resid. Df Resid. Dev P(>|Chi|)    
NULL                              6     172.51              
Cue            1   16.045         5     156.47 6.187e-05 ***
Condition      1   23.546         4     132.92 1.220e-06 ***
Cue:Condition  1   17.850         3     115.07 2.390e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The anova method adds terms sequentially, so it OK for the interaction term (which is the last). If you want to test the direct effect of another term, you need to fit another model omitting that term, and then compare the two models:

> m2 <- glm(Response ~ Cue+Condition, weight=Count, data=dd, family=binomial(link="probit"))
> anova(m1, m2, test="Chisq")
Analysis of Deviance Table

Model 1: Response ~ Cue * Condition
Model 2: Response ~ Cue + Condition
  Resid. Df Resid. Dev Df Deviance P(>|Chi|)    
1         3     115.07                          
2         4     132.92 -1   -17.85  2.39e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

One more thing might be worth mentioning here. Even setting the separation issue aside, in general Wald tests are inferior to likelihood ratio tests. These circumstances contrive to make that really obvious, but LR tests really are better overall. — gung - Reinstate Monica, Apr 17 '12 at 15:27
Thank you! From googling, I think I understand the problem. It also seems there may be corrections available- (www.cas.sc.edu/poli/psrw/Zorn_PA_Final.pdf), but I can't easily implement that- and I understand Aniko's model-fitting suggestion better- so I'll try that going forward. — Tiffany, Apr 17 '12 at 17:17

score 2 · Answer 2 · answered Apr 16 '12 at 23:51

Let me address a few preliminary issues first. As a terminological matter, there is no such thing as "logistic regression with a probit link". Rather, there is the generalized linear model that can use a variety of different links (of which, logit is one, and probit is another), and different distributions of $Y$. Thus, you are doing probit regression. The maximum likelihood methods used to fit these models are not troubled by 0's, unless there's something very weird about your data that I don't recognize. When and whether your statistical software returns a warning vs. results will depend on many factors, importantly, which software you are using, but as I say, it's not clear that anything is amiss.

An interaction means that the nature of the effect of one covariate depends on the level of another. For example, $Y$ may increase as $X_1$ increases when $X_2$ is low, but decrease as $X_1$ increases when $X_2$ is high. However, that is a different issue than the main effect of a covariate (cf., "decreases the difference between the conditions"). It is quite possible that that "cell" (n.b., I know nothing of the structure of your data or the design of your study) has a great deal of leverage. Without knowing more about your data, my first guess would be that you should accept the results that the software are returning. I do believe that arbitrarily changing your data to get the results you expect is generally considered a no-no.

Thanks for correcting the question title and terminology. I've given cell counts to try to explain my data better. I'm on a very, very slow computer right now so I'll double-check everything you've said again the morning and see if I can add any more then. — Tiffany, Apr 17 '12 at 06:18
Now that I've seen what your data looks like, I have to say that my suggestion is wrong. The problem isn't leverage. Greg & Aniko are right that it's about separation. @Aniko's answer really gives you what you need here. — gung - Reinstate Monica, Apr 17 '12 at 15:23
Yes, well, my original description of the interaction was incomplete/confusing. Thanks again for taking the time to point out what was confusing about my question. — Tiffany, Apr 17 '12 at 17:21

score 2 · Answer 3 · answered Apr 17 '12 at 00:08

We don't have enough information to really know what is going on, how are you computing the p-value? what do the graphs of your data look like? etc. One possibility is the Hauk-Donner effect (you can google for the term to see many discussions of this). Another possibility is complete separation in the interaction term (which may indicate that you are overfitting the model for the data given).

"Incorrect" p-value during probit regression when probability of a response is zero

3 Answers3