I have a count variable that I would like to predict using a categorical variable (it has 4 levels). I would like to decide whether I should use Poisson, negative binomial, or zero-inflated negative binomial (ZINB) regression which seem to be the most common choices to deal with count outcome variables.
I generated three models (a Poisson, an NB, and a ZINB model) and contrasted their AIC values. The ZINB model has the lowest AIC (which is weird because the number of 0s is not awfully high) - but I started to wonder: should I even consider using ZINB if I have NO theoretical reason to assume that 0s can come from two different sources? As far as I understand, that's the situation zero-inflated regression is modelling, but in my case the outcome is the frequency of attentional lapses in a given amount of time, and I don't think there would be people in there who could never have any lapses, in addition to people who could, but didn't happen to have any. Am I safe to use negative binomial if this assumption of ZINB doesn't seem to hold?
EDIT: A more general version of this same question (as suggested by Bence in the comments): "if I have an independent, identically distributed sample from an unknown distribution, is it acceptable to fit a distribution to it that I can find no theoretical justification for if this distribution gives a better fit in terms of AIC than another distribution that I can justify?"