1

I am fitting an exponential model using GLM regression (assuming Gaussian error and a log link function) to 1000 trials, giving me 1000 slope-intercept pairs that are moderately correlated. I want to understand the reason for this correlation and whether it is simply an artifact of the y variable being measured at more or less the same values of x across trials, so the scale of x is basically fixed in each regression (see this post).

My question is, do we expect the slope and intercept to be correlated in this setting the same way they are in OLS regression or does the presence of this correlation suggest something different, given that:

  1. This is not the correlation of the slope-intercept sampling distributions for a particular trial (closer to the population distribution across all trials)
  2. This is not OLS regression (closer to nonlinear regression)

1 Answers1

2

I think a proper answer here would appeal to the covariance matrix of the coefficients, but let me try something else.

First, let's remind ourselves that a gaussian GLM with log link models $\log(E[Y])$ as a linear combination of the covariates. This is different from taking the log of $y$ and doing a linear regression, but we can recover the former from the latter by multiplying $E[Y] = \exp(x^T \beta)$ by a factor of $\exp(\hat{\sigma}^2/2)$, where $\hat{\sigma}^2$ is an estimate of the MSE. I wrote a little bit about this on my blog.

So while a gaussian GLM with log link is different than modelling the log of $y$, they are related and we can study one to learn about the other. The relation between the two happens to be an adjustment to the intercept by a estimated constant on the log scale.

Clearly, the coefficients of the linear model for $\log(y)$ are correlated for reasons explained in the post you've linked. If they are correlated, then they remain correlated when one of the variates is shifted by a constant...like $\hat{\sigma}^2/2$.

  • Thank you, this was helpful.

    Based on your response, (1) the slope-intercept correlation should be the same for the OLS and GLM cases, and (2) I should be able to use the equation I shared in the linked post to approximate the expected correlation.

    However, I don’t find either of these statements to be true. I find the correlation for the OLS example is -0.4 while for the GLM it is -0.5. Based on the equation in the post, the expected correlation is -0.7, which is exactly the correlation I get if I don’t fit an exponential model, but a line without transforming y. How could this be?

    – Applesauce26 Dec 13 '23 at 12:37
  • 1
    @Applesauce26 points (1) and (2) are not what you should be taking from my answer. You asked if you should expect the parameters to be correlated in the same way as they were in OLS. My answer shows they should be correlated due to the possible interpretation of the gaussian glm with log link as an OLS fit on the log scale. The size of that correlation will be different because the hessian of the loglikelihood will likely be different. – Demetri Pananos Dec 13 '23 at 15:58
  • Apologies, I did not mean to misconstrue your answer. What I gathered is we expect there to be correlation for both the GLM and OLS cases given the relationship you highlighted between E(log(X)) and log(E(x)). Points 1 and 2 were my own follow-up logic, which I am still struggling to comprehend. Maybe this is a separate question, but what causes the size of the observed correlation to differ from the expected correlation? – Applesauce26 Dec 13 '23 at 16:29