1

Here are two models (with R code to provide some context):

Model 1:

Take the log of the output variable $y$, then apply a Gamma GLM using the default identity link function:

glm(log(y) ~ a + b, family = gamma, data = ...)

Model 2:

Apply a Gamma GLM with log link function without logging the output variable:

glm(y ~ a + b, family = gamma(link = "log"), data = ...)

When I apply predictions on these two models, they give me slight but material differences. I have trouble understanding why the outputs are different.

Stefan
  • 6,431
Ruser
  • 11
  • 2
    Perhaps the distinction more readily becomes clear by considering the simplest possible versions of these models; namely, log(y) ~ 1 versus y ~ 1 (with log link). What are the assumed distributions of $y$ in each case? – whuber May 29 '19 at 16:11
  • I would think that both models assume Gamma as the underlying distribution. It's just a matter of when to apply the log. However, I fail to see why that would make a difference. – Ruser May 29 '19 at 16:31
  • 3
    When the logarithm of a variable has a Gamma distribution, the original variable does not have a Gamma distribution. For a well-known example of this phenomenon, compare Normal to Lognormal variates or Uniform to Exponential variates. – whuber May 29 '19 at 17:47
  • 2
    It might be useful to simulate the distribution of the log of a gamma random variate and see that it's left skew rather than right skew. Note that in a GLM the link function doesn't transform the random variable. – Glen_b May 30 '19 at 01:11

0 Answers0