4

I am working on an assignment where the objective is to predict housing prices. My initial approach involves using an Ordinary Least Squares model. Following this, I plan to make a Random Forest model for comparison.

Generally, it is understood that these 'black-box' machine learning algorithms offer better predictive capabilities but lack the interpretability of OLS models.

I want to quantify the differences in their predictive power. However, I am uncertain about the most appropriate evaluation criteria to use.

Question 1: What are the most suitable metrics for comparing the predictive power of two OLS models?

Question 2: What are the most appropriate metrics for comparing the predictive power of an OLS model with that of a Random Forest model?

It would be great if relevant literature could be recommended, but any help is highly appreciated!

Tim
  • 273

1 Answers1

6

It depends on what you want to predict.

If you want to predict the conditional expectation, use the MSE. If you want to predict the conditional median, use the MAE. This may be more useful for housing prices, which are typically skewed. Just be aware that the conditional median and the conditional mean may be quite different. If you want a quantile prediction, use a pinball loss. If you want the conditional (-1)-median, use the MAPE. (If you don't know what the (-1)-median is, then please read that thread before using the MAPE.)

You can always consider monotonic transforms, or scaling or weighting. Just be careful you know which end of the dog is doing the wagging if you scale or weight.

Kolassa (2020) may be helpful.

Note that I am not discussing which metrics are appropriate for which method. That is because you want to assess the predictions, not the methods - or, you want to assess the methods through the predictions.

Stephan Kolassa
  • 123,354
  • 4
    Along with MSE and MAE (and also use mean absolute difference MAD), a high-resolution calibration curve will expose overfitting/overprediction. Random forests are notorious for being poorly calibrated due to extreme overfitting. And be sure to evaluate calibration unbiasedly using strong internal validation with bootstrap or cross-validation, or using a separate sample that is large. – Frank Harrell Jan 31 '24 at 14:29