I got a good OLS fit for integer variables, do I still need to use count data methods?

Question

First of all, I'm not a statistician, but I'm teaching myself some methods I require for a project I'm doing now.

I have a 2D dataset of N observations. For the ith observation, the first entry is the estimated number of lines in the ith chapter of a book, and the second entry is the actual number of lines in that chapter.

All the values are large (so I guess there's no point on using Poisson distributions), and after taking log to both entries I get a nice line. There is no heteroscedasticity, and the distribution of residuals seems to be normal. My question is, should I anyway necessarily use methods for count data? If not, how could I justify it? If yes, a negative binomial distribution would do the trick?

If, as you suggest, the variance assumption is okay, then it's probably not going to lead to any problems at all just dealing with OLS regression. — Glen_b, May 28 '14 at 00:23
See http://stats.stackexchange.com/questions/142338/goodness-of-fit-and-which-model-to-choose-linear-regression-or-poisson/142353#142353 — kjetil b halvorsen, Dec 02 '16 at 18:46

score 4 · Accepted Answer · answered May 27 '14 at 23:42

4

Usually, if all the integer values are large and if there are a lot of different values, ordinary regression works OK. Indeed, many typical dependent variables are of this nature, at least, as recorded. E.g. weight of adult humans is recorded in pounds or kg, not fractions of an ounce or gram.

The key thing is to look at the assumptions of the model.

answered May 27 '14 at 23:42

Peter Flom

119,535
36
175
383

Even if one is "counting lines", it has more the flavour of a measurement problem, so usual linear regression seems fine. Specifically, there is no reason to expect a poisson or similar distribution. – kjetil b halvorsen Apr 10 '16 at 09:08

I got a good OLS fit for integer variables, do I still need to use count data methods?

1 Answers1