What coefficients to include in logit component of zero-inflated and hurdle models?

Question

I'm new to statistics so hoping for a ELI5 explanation! I need to use a hurdle (or zero-inflated) model to try and replicate someone elses methodology on a newer dataset for my undergraduate dissertation.

I ran the model using the pscl package in R and the plotting looks like what I'd expect, though I'm cautious because admittedly I barely understand this stuff.

I'm researching whether there is a link between a UK member of parliament's margin of victory at the previous election (majority) and the number of motions they proposed (count).

Here's what my data looks like:

The variable that has an excess of zero counts is the "count" variable -- 186/371 are zeros. There are zeros in the majority column too but these are a predictor, so I don't want them inflated if that makes sense. There is a considerable difference in the amount of motions proposed between the parties.

Here's the model:

m1 <- hurdle(count ~ majority | politicalParty, data = zinb, dist = "negbin")

I understand that there is a count component (count ~ majority) and the "logit" component (politicalParty), but I don't really understand what coefficients to include on either side. For instance, should I also include the count and majority in the logit model?

I also don't really understand this line from UCLA's example "Since zero inflated negative binomial has both a count model and a logit model, each of the two models should have good predictors. The two models do not necessarily need to use the same predictors." How do I know which model's predicted values are being used?

score 1 · Answer 1 · answered Apr 29 '19 at 11:48

Given that you have only two regressors (one of which should actually be coded as a categorical factor in R) I would simply include both of them in both parts, possibly even their interaction. And then you can look at the corresponding confidence intervals, Wald or likelihood ratio tests, or information criteria (AIC/BIC) which of the two regressors actually has an effect in which component of the model.

What coefficients to include in logit component of zero-inflated and hurdle models?

1 Answers1