1

I am trying to understand GLMs by trying to run them on my own on some Poisson data. My understanding is that if I have my Y values and X values, then using the log-link function, $log[Y] = mX+b$ for some $m, b$. Now the question is to find the coefficients which best explain my Y variable. When I read about it online, they will exponentiate this expression to optimize the likelihood over $m, b$. My question is, why can this not be done through simple linear regression on the $log[Y]$ values and then exponentiated after fitting a simple linear curve? Does this not result in a simple closed-form solution? Is there something wrong about this approach?

bGe
  • 21
  • 2
    GLMs consider $\log \mu$, where $\mu$ is the expected value of $Y$. That's different than considering $\mathbb{E}[\log Y]$, which has the expectation and the logarithm interchanged. – Ben Jun 15 '22 at 15:44
  • 1
    What would you do with the logarithm upon observing a count of zero, which always has a positive probability for any Poisson distribution? For various accounts of what Poisson regression actually is, please search this site. https://stats.stackexchange.com/questions/69820 answers the same question for logistic regression. – whuber Jun 15 '22 at 16:08

1 Answers1

1

You could use a linearized form of the equation and apply ordinary linear regression. But, this makes two errors:

  • You ignore non-homogeneity of the error distribution. The variance in $Y$, and also the variance in the transformed variable $\log(Y)$, is not the same for different values of $X$. You need to perform the linear regression with weights $w_i =\hat{Y}_i$ to correct for this, where $\hat{Y}_i$ is the estimated value (more about that later).

  • A non-linearity in the effect of errors on the response when a link function is applied to the response. (ie. the model of the error distribution is for $Y$ and not for $\log(Y)$ and that needs to be corrected).

    This is a bit the same as the lognormal distribution (the exponetiation of a normal distribution with mean $\mu$ and deviation $\sigma$) not having a mean $\exp(\mu)$ but instead a mean $\exp(\mu + 0.5 \sigma^2)$.

    That is an additional correction, and instead of using $Y_i^\prime = \log(Y_i)$ you use $Y_i^\prime = \log(Y_i) + (Y_i-\hat{Y_i})/\hat{Y_i}$.

So to find the solution $\hat{Y}_i$ with a linear regression, you need to use two corrections that depend on the solution $\hat{Y}_i$ itselve. That is why there is no closed form and an itterative procedure is used. We use a starting point, assume some initial value as solution and use that to compute a new value. Based on that new value we compute a new value, and so on.

See also What is the objective function to optimize in glm with gaussian and poisson family?

  • I understand. I was assuming log[Y] = mx + b, where errors in the log[Y] term are typical gaussian noise (as in linear regression). In fact it is log[E[Y]] = mx+b . – bGe Jun 16 '22 at 03:27