23

A Poisson regression is a GLM with a log-link function.

An alternative way to model non-normally distributed count data is to preprocess by taking the log (or rather, log(1+count) to handle 0's). If you do a least-squares regression on log-count responses, is that related to a Poisson regression? Can it handle similar phenomena?

whuber
  • 322,774
  • 7
    How do you plan on taking logarithms of any counts that are zero? – whuber Mar 19 '11 at 18:47
  • 4
    Definitely not equivalent. An easy way to see this is to look at what would happen if you observed zero counts. (Comment created before seeing @whuber's comment. Apparently this page didn't refresh appropriately on my browser.) – cardinal Mar 19 '11 at 18:52
  • OK, I obviously should say, log(1+count). Obviously not equivalent, but wondering if there was a relationship, or if they can handle similar phenomena. – Brendan OConnor Mar 21 '11 at 00:45
  • 2
    There is useful discussion of this issue here: http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ – Michael Bishop Dec 09 '11 at 23:55
  • See https://stats.stackexchange.com/questions/142338/goodness-of-fit-and-which-model-to-choose-linear-regression-or-poisson/142353#142353 – kjetil b halvorsen Jun 10 '22 at 16:10

2 Answers2

25

On the one hand, in a Poisson regression, the left-hand side of the model equation is the logarithm of the expected count: $\log(E[Y|x])$.

On the other hand, in a "standard" linear model, the left-hand side is the expected value of the normal response variable: $E[Y|x]$. In particular, the link function is the identity function.

Now, let us say $Y$ is a Poisson variable and that you intend to normalise it by taking the log: $Y' = \log(Y)$. Because $Y'$ is supposed to be normal you plan to fit the standard linear model for which the left-hand side is $E[Y'|x] = E[\log(Y)|x]$. But, in general, $E[\log(Y) | x] \neq \log(E[Y|x])$. As a consequence, these two modelling approaches are different.

ocram
  • 21,851
  • 8
    Actually, $\mathbb{E}(\log(Y) | X) \neq \log(\mathbb{E}(Y | X))$ ever unless $\mathbb{P}(Y = f(X) | X ) = 1$ for some $\sigma(X)$-measurable function $f$, i.e., $Y$ is fully determined by $X$. – cardinal Mar 19 '11 at 18:50
  • 1
    @cardinal. Very well put. – suncoolsu Mar 20 '11 at 12:34
10

I see two important differences.

First, the predicted values (on the original scale) behave different; in loglinear least-squares they represent conditional geometric means; in the log-poisson model the represent conditional means. Since data in this type of analysis are often skewed right, the conditional geometric mean will underestimate the conditional mean.

A second difference is the implied distribution : lognormal versus poisson. This relates to the heteroskedasticity structure of the residuals : residual variance proportional to the squared expected values (lognormal) versus residual variance proportional to the expected value (Poisson).

ludo
  • 101