I am a MBA Student that is taking some statistics courses, a colleague recommended this site as a useful resource! So far there seems to be a lot of interesting information here!
We are learning about "Count Data" - for example, we are interested in learning how to make a model that predicts the number of complaints a customer might file. In a previous stats course, I learned about basic regression models and the Poisson Distribution. Now, I am trying to understand how these two can be put together!
1 - The first question I had was why should Poisson Regression be used for Count Data instead of a "vanilla linear regression"? I understand the basic argument : Count Data is by definition discrete and you would rather use a model in which predictions are always discrete (i.e. Poisson Regression) ... but to me, this seems like a formality. Couldn't I just use a linear regression model and round the predictions to the nearest whole number?
2 - We started learning about something called "Overdispersion" and "Zero Inflated Data". Overdispersion is when the variance is larger than expected and Zero Inflated Data is data with a lot of zeros (this is apparently handled by "Hurdle Models"). Our prof explained to us that a standard Poisson Distribution only has a "parameter" for the "mean" and not for the "variance", therefore a Poisson Regression will assume the variance is equal to the mean. But sometimes, the mean and the variance are not always equal - this is Overdispersion and the standard Poisson Regression is not built to handle this. Correct?
3 - I read online (https://aip.scitation.org/doi/pdf/10.1063/5.0040330) that "If the equi-dispersion is not met, the Poisson Regression is no longer appropriate to model the data. Moreover, the resulted model will yield biased parameter estimation and underestimate the standard error". Why does this happen?
4 - And finally, our prof talked a bit about "Quasi Poisson Models" and "Negative Binomial Regression" (I think both of these are similar if I recall correctly). Supposedly, these models can handle the problem of Overdispersion. Is this because the Negative Binomial Probability Distribution has an additional parameter that can try to model non-constant variances?
Although are assignments are pretty straightforward, I would still like to "peak behind the curtains" and try to understand some of the background material. All help is appreciated!