3

I have used glm() to model some data I have. The code looks like the following:

for(ddm_idx in 1:90) {
    for(ppm_idx in 1:90) {
        mdfit <- glm(cuse[[4]] ~ cuse_ddm[[3 + ddm_idx]] + cuse_ddm[[3 + ddm_idx]]^2 + 
                                 cuse_ppm[[3 + ppm_idx]] + cuse_ppm[[3 + ppm_idx]]^2 + 
                                 cuse_ddm[[3 + ddm_idx]]*cuse_ppm[[3 + ppm_idx]], 
                     family=poisson(link=log))
        mdfit_dev[ddm_idx, ppm_idx] <- deviance(mdfit)
    }
}

It turns out that for each "case", I have about 90 different data points for ddm and ppm and so that's why I have the for loop run twice. I know this is correct because a post-doc in stats also ran the same in MATLAB and got the same results.

However, my next task to to use zero inflated Poisson distribution as I have a lot of zeros in my dataset. Some of these zeros are "true" zeros and some of them false.

How can I modify my code to use glm() for this distribution?

masfenix
  • 481
  • Could you clarify what you mean by a false zero? – Glen_b Mar 24 '14 at 03:24
  • cuse[[4]] are the number of cases per week. There are 240 weeks. The number of cases are reported by someone. In some weeks there were indeed 0 cases. In other weeks, the person was too lazy to count or did not show up to work or forgot to count it for that week. This is a false zero. – masfenix Mar 24 '14 at 03:49
  • 1
    Thanks. So both missing/NA and 0 are both coded as 0. – Glen_b Mar 24 '14 at 03:58
  • Yes, Correct. I wish i can send you some data but I unfortunately am under a contract. I think http://stats.stackexchange.com/questions/45262/zero-inflated-count-models-in-r-what-is-the-real-advantage is what I am looking for but I don't know what regressors are. – masfenix Mar 24 '14 at 04:03
  • I'm likely glad you can't. (If it was too big to include in the question, I don't want it anyway; in many cases, it may be better to make up a small example that shows the essential features of what you're dealing with.) – Glen_b Mar 24 '14 at 04:09
  • Thanks @Glen_b, I did update that comment. Would you know anything about regressors? – masfenix Mar 24 '14 at 04:10
  • 1
    Do you know what regression is? As in the first sentence here? In Poisson regression (and glms more generally), regressors (predictors, independent variables) play the same conceptual role as in multiple linear regression. – Glen_b Mar 24 '14 at 04:14
  • 1
    This question appears to be off-topic because it is about asking for code. – gung - Reinstate Monica Mar 27 '14 at 18:20
  • 3
    @gung Yes, this post does ask for code. But it also implicitly raises a question of how one could correctly handle mis-coded data in which true zeros are confounded with missing values. That question could only be answered here, not on SO, and a good answer would likely be much more useful than a pedestrian answer that points the OP to some blackbox code (which, if employed, would likely give bad results). – whuber Mar 27 '14 at 18:42
  • @whuber, good point. – gung - Reinstate Monica Mar 27 '14 at 18:52

1 Answers1

2

zeroinfl() in the pscl package fits the zero-inflated Poisson regression model.

pyoi
  • 36
  • Welcome to the site, @pyoi. This isn't really an answer to the OP's question, it is more of a comment. Please only use the "Your Answer" field to provide answers. I know it's frustrating, but you will be able to comment anywhere when your reputation >50. Alternatively, you could expand this to make it more of an answer. Since you are new here, you may want to take our tour, which contains information for new users. – gung - Reinstate Monica Mar 27 '14 at 18:22