I am running a probit regression with glm. I got a non-significant value (p~0.98) for an interaction that other statistical methods suggested should be significant. Looking into it, I think it's because 1 of the cells had zero positive ("YES") outcomes. When I artificially change a data point so that there's one positive outcome, I get a significant result of p < 0.02, even though this actually decreases the difference in the effect of the cue ("Cue") between the two conditions ("Condition").
The code in R is glm(Response~Cue*Condition, family=binomial(link=probit), data=data)
The data look like this:
Condition = 0
Response: 0 1
Cue: 0 49 3
1 25 25
Condition = 1
Response: 0 1
Cue: 0 48 3
1 47 0
The counts aren't quite exact but it shows the pattern, which is that the subject responds "appropriately" 50% of the time to Cue 1 in Condition 0 but does not do so in Condition 1 (and in fact, responds slightly more often in this condition to the incorrect cue, although this isn't significant). Changing the last row of counts to 46 1 gives the significant interaction I was expecting.
My intuitions are hazy, but I imagine this has something to do with the probit function being undefined for probabilities of 0 or 1. Does this seem correct? If so, why does it return a non-significant value rather than a warning? More to the point, is there a way to deal with this? I can imagine building in a correction that gives a non-zero estimate for the probability based on the number of trials, but is there a standard, implemented solution?