What is the difference between Gamma GLM on log output and Gamma GLM with log link function?

Question

Here are two models (with R code to provide some context):

Model 1:

Take the log of the output variable $y$, then apply a Gamma GLM using the default identity link function:

glm(log(y) ~ a + b, family = gamma, data = ...)

Model 2:

Apply a Gamma GLM with log link function without logging the output variable:

glm(y ~ a + b, family = gamma(link = "log"), data = ...)

When I apply predictions on these two models, they give me slight but material differences. I have trouble understanding why the outputs are different.

Perhaps the distinction more readily becomes clear by considering the simplest possible versions of these models; namely, log(y) ~ 1 versus y ~ 1 (with log link). What are the assumed distributions of $y$ in each case? — whuber, May 29 '19 at 16:11
I would think that both models assume Gamma as the underlying distribution. It's just a matter of when to apply the log. However, I fail to see why that would make a difference. — Ruser, May 29 '19 at 16:31
When the logarithm of a variable has a Gamma distribution, the original variable does not have a Gamma distribution. For a well-known example of this phenomenon, compare Normal to Lognormal variates or Uniform to Exponential variates. — whuber, May 29 '19 at 17:47
It might be useful to simulate the distribution of the log of a gamma random variate and see that it's left skew rather than right skew. Note that in a GLM the link function doesn't transform the random variable. — Glen_b, May 30 '19 at 01:11

What is the difference between Gamma GLM on log output and Gamma GLM with log link function?

0 Answers0