sample code:
set.seed(123)
n = 500
x = rnorm(n)
a = -5
b = 0.5
y = exp(a + b * x + rnorm(n))
plot(x, y, pch = 20)
m = glm(y ~ x, family = gaussian(link = "log"))
sample_coefs = MASS::mvrnorm(n = 5000, mu = m$coefficients, Sigma = vcov(m))
sample_x = seq(-4, 4, 0.05)
sample_lines = sample_coefs |> apply(1, (pars){exp(pars[1] + sample_x * pars[2])}) |> apply(1, (x)quantile(x, c(0.1, 0.9)))
polygon(x = c(sample_x, rev(sample_x)), y = c(sample_lines[1,], rev(sample_lines[2,])), col = rgb(1, 0, 0, 0.5))
y_true = exp(a + sample_x * b)
lines(sample_x, y_true, col = "blue", lwd = 2)
y_est = exp(m$coefficients[1] + sample_x * m$coefficients[2])
lines(sample_x, y_est, lwd = 2)
y_buffer_90 = qnorm(p = 0.9, sd = summary(m)$sigma)
lines(x = sample_x, y = m$coefficients[1] + sample_x * m$coefficients[2] + y_buffer_90)
lines(x = sample_x, y = m$coefficients[1] + sample_x * m$coefficients[2] - y_buffer_90)
The thick black line shows the estimated relationship. The red area shows the 80% confidence interval of the estimated relationship. The blue line shows the true relationship.
As can be seen, the truth is far from the fit. Why does this happen?


y <- exp(a + b * x) + rnorm(n, 0, 0.001)and fit the GLM again. – COOLSerdash Jul 13 '23 at 07:11