I'm working on a rental price prediction project and I want to make sure I'm evaluating things correctly. Basically, after I fitted some models with the data and compute R-squared on training and testing, the gap between them is a bit too large, the score was like 0.68 on testing but 0.76 on training. This is clearly overfitting and I tried a lot of techniques to reduce it but didn't improve that much. I later reinspected the data and found that my y(prices) are a bit skewed, so I applied log transformation on them, turns out, I got a better score by doing so. The R-squared now on training is around 0.78 and testing is 0.75.
I know that from here: How to compute the R-squared for a transformed response variable? I can't compare the R-squared between two models with different dependent variables, but my point is, seems like I reduced overfitting a little bit by doing so.
I just want to make sure I'm doing things on the right track, any suggestion is appreciated.
Edit: Someone pointed out I shouldn't transform the data by simply looking at the marginal distribution of y, but why would people on kaggle just did that:https://www.kaggle.com/code/apapiu/regularized-linear-models/notebook.