i think i'm with a little problem. I was working with a dataset but the dependent variable was not normally distributed, so i decided to used log in that variable. To see the difference between them i made a model with log transformation and a model without the log transformation. So in my model with the log transformation i made the linear regression, predicted the values and then back transformed with exp. While visually speaking the prediction went better, the mean squared error get very worse. I'm doing something wrong along the way?
Asked
Active
Viewed 81 times
0
-
2Welcome to Cross Validated! I have some good news: you don’t need the $y$ variable to have a normal distribution (the common normality assumption is about the error term), so it might be that you did it have to take the log at all! – Dave Oct 20 '22 at 21:40
1 Answers
0
so without more details its hard to decide, but
a) exponentiating will cause small errors to be magnified.
b) your problem could be bias induced by exponentiating see https://stats.stackexchange.com/a/361240. you need E[y|x], but its not $\exp(E[\ln(y)|x])$ a simple approximate solution is multiplying your prediction by $\exp(.5 mse)$ as detailed in the linked answer
seanv507
- 6,743