2

I would like to understand what difference it makes, if I use, for example, either Mean Square Error or Poisson Deviance as error metric/loss function for a regression of count data. Are there any a-priori or theoretical reasons to prefer one metric over the other?

Background

I do have a large dataset of count data (i.e. number of events and exposures) depending on various covariates. I would like to fit various models (classical GLMs as well as machine learning models) to this dataset and assess their quality. My main goal is to make point predictions for the conditional means of the distribution depending on the covariates. The according rates are typically small (between 0 and 10% say) and in many cases there are zero observed events. I do not have any domain specific reason, such as a cost function, to choose my error metric, and I am not so much interested in inference (yet). At the moment, I just want a "best" prediction, whatever that means.

As I understand, I should choose a "proper scoring function" for the conditional mean. I further understand that both Mean Squared Error and Poisson Deviance fulfil this requirement.

My questions

  • Beyond being proper scoring functions for the mean, are there any other requirements I should be aware of?
  • Are there other relevant metrics which could (or should) be used for assessing the quality of predictions?
  • What may be possible reasons to prefer one error metric such as Poisson Deviance or Mean Squared Error over other metrics?
g g
  • 2,686
  • I don't know enough about the Poisson Deviance (and there seems to be conflicting information out there on it; maybe you want to include your formulas?), but you may find this short paper of mine helpful; feel free to ping my on ResearchGate for it. Overall, I would say that you should first figure out which functional you want to elicit, then use a corresponding accuracy metric. Which you did: you want the conditional mean, so the MSE is certainly a good choice. If deviance is also minimized in expectation by the mean, that is also good. – Stephan Kolassa Jan 29 '24 at 10:22
  • ... There might be differences in the behavior of the two error measures near the true value you aim for (with the complication that this is unknown). Also, the MSE is minimized by the mean for all conditional distributions; the deviance may break down if your actual data is not Poisson (again, I don't know enough to really say anything here). In which case you have the issue of whether your Poisson assumption is "good enough". – Stephan Kolassa Jan 29 '24 at 10:24
  • @StephanKolassa: Thank you for the comment. I had a look at your paper before posting the question and was hoping that you could clarify the Count case as well ;- ) The paper is very helpful, but, of course, it is focused on general regression and does not address the specific topic of count regression. The issue you raise about misspecification of the error distribution is a good example for my first bullet point. Some metrics being "good" for all distributions, while others only for specific ones. – g g Jan 29 '24 at 10:56
  • If you are specifically interested in count processes, this earlier paper might also be interesting. There, I explicitly looked a proper scoring rules. (Poisson didn't work too well.) I would say that counts vs. continuous data don't really make that much difference in evaluation. Point forecast accuracy measures work for both, and most proper scoring rules do so, too, possibly with simple modifications, e.g., the CRPS. – Stephan Kolassa Jan 29 '24 at 12:00
  • What reason do you have to believe the conditional distributions to be Poisson? There are other count distributions, even unbounded above (e.g., negatve binomial). – Dave Jan 30 '24 at 11:53
  • @Dave I did not say and do not know (yet) anything about the underlying distribution except that the observations are counts and exposures. This question is about possible error metrics and their pros and cons. If you know specific reasons how and why an error metric for count data should be related to the assumed/estimated conditional distribution, please, feel free to post as answer. – g g Jan 30 '24 at 12:27
  • My comments here could be worth a read. $//$ If you know the conditional distribution to be Poisson, then minimizing the Poisson deviance is maximum likelihood estimation and has the many nice MLE properties. – Dave Jan 30 '24 at 12:39
  • @Dave Yes, MLE is definitely nice, but I am not really interested in inference and some of the models have to be estimated with other loss functions (typically MSE) anyway. So estimation is not the primary focus either. I think this evens out many of the advantages of MLE. I am mainly interested in error metrics to compare final fitted models, typically on hold out data. – g g Jan 30 '24 at 12:55

0 Answers0