I have built two regression models to predict sales of different products based on a number of explanatory variables, with an offset term for the number of days each product was on sale. One is a Gaussian model, the other is a Poisson (both using a log link function) both trained on the same dataset of some 30k observations.
My initial confusion: on the training data, the Poisson regression achieved a much better fit as measured by pseudo-R², but had higher RMSE (and mean absolute error). Having read this I understand why.
My question now - when comparing goodness of fit on the test data, what's the appropriate measure to use? RMSE will necessarily favor OLS on the training data, so it doesn't seem like a fair comparison on the test data either. A logarithmic score is valid for Poisson because I have probabilities for each integer outcome but not applicable to OLS.
Some extra information from comments:
Number of observations: About 35k in the training set, and another 10k for test. What was done? My approach had been to look at RMSE and MAE (and inspect a lot of plots) but I was just a bit wrong-footed when my Poisson model - with its much better fit to the training data - nevertheless ended up with higher RMSE and MAE than the Gaussian model.