Poisson regression vs. log-count least-squares regression?

Question

A Poisson regression is a GLM with a log-link function.

An alternative way to model non-normally distributed count data is to preprocess by taking the log (or rather, log(1+count) to handle 0's). If you do a least-squares regression on log-count responses, is that related to a Poisson regression? Can it handle similar phenomena?

How do you plan on taking logarithms of any counts that are zero? — whuber, Mar 19 '11 at 18:47
Definitely not equivalent. An easy way to see this is to look at what would happen if you observed zero counts. (Comment created before seeing @whuber's comment. Apparently this page didn't refresh appropriately on my browser.) — cardinal, Mar 19 '11 at 18:52
OK, I obviously should say, log(1+count). Obviously not equivalent, but wondering if there was a relationship, or if they can handle similar phenomena. — Brendan OConnor, Mar 21 '11 at 00:45
There is useful discussion of this issue here: http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ — Michael Bishop, Dec 09 '11 at 23:55
See https://stats.stackexchange.com/questions/142338/goodness-of-fit-and-which-model-to-choose-linear-regression-or-poisson/142353#142353 — kjetil b halvorsen, Jun 10 '22 at 16:10

score 25 · Answer 1 · answered Mar 19 '11 at 18:27

On the one hand, in a Poisson regression, the left-hand side of the model equation is the logarithm of the expected count: $\log(E[Y|x])$.

On the other hand, in a "standard" linear model, the left-hand side is the expected value of the normal response variable: $E[Y|x]$. In particular, the link function is the identity function.

Now, let us say $Y$ is a Poisson variable and that you intend to normalise it by taking the log: $Y' = \log(Y)$. Because $Y'$ is supposed to be normal you plan to fit the standard linear model for which the left-hand side is $E[Y'|x] = E[\log(Y)|x]$. But, in general, $E[\log(Y) | x] \neq \log(E[Y|x])$. As a consequence, these two modelling approaches are different.

Actually, $\mathbb{E}(\log(Y) | X) \neq \log(\mathbb{E}(Y | X))$ ever unless $\mathbb{P}(Y = f(X) | X ) = 1$ for some $\sigma(X)$-measurable function $f$, i.e., $Y$ is fully determined by $X$. — cardinal, Mar 19 '11 at 18:50

score 10 · Answer 2 · answered Aug 25 '11 at 22:23

I see two important differences.

First, the predicted values (on the original scale) behave different; in loglinear least-squares they represent conditional geometric means; in the log-poisson model the represent conditional means. Since data in this type of analysis are often skewed right, the conditional geometric mean will underestimate the conditional mean.

A second difference is the implied distribution : lognormal versus poisson. This relates to the heteroskedasticity structure of the residuals : residual variance proportional to the squared expected values (lognormal) versus residual variance proportional to the expected value (Poisson).

Poisson regression vs. log-count least-squares regression?

2 Answers2

Linked

Related