3

First of all, I'm not a statistician, but I'm teaching myself some methods I require for a project I'm doing now.

I have a 2D dataset of N observations. For the ith observation, the first entry is the estimated number of lines in the ith chapter of a book, and the second entry is the actual number of lines in that chapter.

All the values are large (so I guess there's no point on using Poisson distributions), and after taking log to both entries I get a nice line. There is no heteroscedasticity, and the distribution of residuals seems to be normal. My question is, should I anyway necessarily use methods for count data? If not, how could I justify it? If yes, a negative binomial distribution would do the trick?

Alfonso
  • 33
  • 1
    If, as you suggest, the variance assumption is okay, then it's probably not going to lead to any problems at all just dealing with OLS regression. – Glen_b May 28 '14 at 00:23
  • See http://stats.stackexchange.com/questions/142338/goodness-of-fit-and-which-model-to-choose-linear-regression-or-poisson/142353#142353 – kjetil b halvorsen Dec 02 '16 at 18:46

1 Answers1

4

Usually, if all the integer values are large and if there are a lot of different values, ordinary regression works OK. Indeed, many typical dependent variables are of this nature, at least, as recorded. E.g. weight of adult humans is recorded in pounds or kg, not fractions of an ounce or gram.

The key thing is to look at the assumptions of the model.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • Even if one is "counting lines", it has more the flavour of a measurement problem, so usual linear regression seems fine. Specifically, there is no reason to expect a poisson or similar distribution. – kjetil b halvorsen Apr 10 '16 at 09:08