In Poisson regression, there are two possible ways we can relate the dependent variable $y$ with the independent variables $x$:
- $E[y|x] = w^Tx$
- $E[y|x] = e^{w^Tx}$
The likelihood functions are:
- $ l(w) = \log{p(y|x)} = y\log(w^Tx)-w^Tx-\log{y!} $
- $ l(w) = \log{p(y|x)} = yw^Tx-e^{w^Tx}-\log{y!} $
To estimate the parameters $w$, we can use, for example, gradient ascent:
- $ \frac{\partial l(w)}{\partial w} = (\frac{y}{w^Tx}-1)x $
- $ \frac{\partial l(w)}{\partial w} = (y-e^{w^Tx})x $
For the linear model, apparently we need a constraint $w^Tx > 0$.
My question is:
What are the differences between these two models?
More specifically, why or when should we choose one over the other? Is there some data that can be modelled by one but not the other? What are their pros and cons?
(I noticed that for the loglinear model, (online) gradient ascent is quite unstable, because $e^{w^Tx}$ can become too great for even moderate $w$ and $x$.)