2

Proper scoring rule is a concept used for evaluating density forecasts. What would be an equivalent for evaluating point forecasts? E.g. mean squared error seems like a proper metric for evaluating forecasts that target the expected value of the underlying random variable. This is because the forecast that truly minimizes the expected squared error (the population counterpart of the mean squared error) actually is the expected value. Meanwhile, mean absolute error does not seem proper for the same goal. On the other hand, it would seem proper when the target is the median of the underlying random variable. So what term is there to describe the propriety of a metric for evaluating point forecasts?

Richard Hardy
  • 67,272
  • Aren’t most predictions point forecasts? – Dave May 24 '23 at 11:47
  • What would that be? As I understand it, the point of scoring rules is that you are fitting your model to the labels, but you try to predict the (unobserved) probabilities, so you want to assess and possibly fix the predicted probabilities. With point predictions, you are directly predicting the values, so to assess your model you can just look at the metrics that are appropriate for the problem. – Tim May 24 '23 at 11:47
  • So scoring-like scenario would be if your predicted values were something like aggregates (e.g. group means) and you wanted to fit your model to them but you cared about predicting the individual-level data (de-aggregated), so you would like to validate and fix the individual predictions. But we have other ways of dealing with problems like this than scoring rules. – Tim May 24 '23 at 11:48
  • @Tim, in my understanding, propriety of scoring rules is about ranking competing forecasts, not assessing individual forecasts or trying to improve them. Now, a point prediction targets some function of the pdf, e.g. the mean, the median, a quantile or such. In a forecasting competition, this target should be specified explicitly. Yet some competitions judge the same forecast by multiple criteria: MSE, MAE, MAPE etc. This does not make much sense. (Why judge a prediction that targets the mean by how close it gets to the median?) And thus the need for the term I am looking for. – Richard Hardy May 24 '23 at 12:13
  • @Dave, sure, they are. – Richard Hardy May 24 '23 at 12:13
  • @RichardHardy if your aim it to minimize squared error, why would you train it or evaluate using absolute error (or something else) if you can train it using squared error..? This is what I mean by saying that in point predictions you can do it directly, where you (usually) cannot directly train or evaluate on probabilities. It doesn'e seem to need a specific name. – Tim May 24 '23 at 12:29
  • @Tim, at the stage of forecast evaluation, training is irrelevant, so we can leave it out of the discussion. (And we know well that the training loss function does not have to match the evaluation loss function for best results in terms of out-of-sample evaluation loss, though it often would, but that is another topic. See also this.) I am interested in proper ranking in forecasting competitions and related uses. – Richard Hardy May 24 '23 at 12:34
  • Of most forecasts are point forecasts then aren’t the usual scoring rules the ones you want? – Dave May 24 '23 at 16:21
  • @Dave, scoring rules (including proper ones) are defined for density forecasts. I am looking for an equivalent for point forecasts. – Richard Hardy May 24 '23 at 17:05
  • You mean Brier score and log loss? – Dave May 24 '23 at 17:07
  • @Dave, I mean in general. I have not seen scoring rules discussed in any other context. – Richard Hardy May 24 '23 at 20:22
  • It seems like Brier score and log loss are scoring rules for point forecasts. – Dave May 24 '23 at 20:23
  • 1
    @Dave, are they, though? Wikipedia's entry suggests Brier score involves the estimated probabilities rather than the predicted class labels. – Richard Hardy May 25 '23 at 09:22
  • Terminology-wise, you may be looking for the term "scoring function"? (outlined briefly in the second paragraph here) This appears to be the point forecast analogue of scoring rules. I'm not sure of the effectiveness/validity, but one could apply proper scoring rules to point forecasts by representing the point forecast as a deterministic distribution ($F(x) = H(x - y), x \in \mathbb{R}$, where y is your point prediction and $H$ is the right-continuous Heaviside Step function. – QMath Jul 01 '23 at 04:20
  • Propriety/Strict Propriety also seems to apply similarly to the point prediction case as if $S$ is our scoring metric (function from $\mathcal{D}^{2}$ to the reals where $\mathcal{D}$ is the space our data takes values in), $d \in \mathcal{D}$ is a data point, and $p \in \mathcal{D}$ is our point prediction, then $S$ is proper if $S(d, d) \leq S(d, p)$ for all $d, p \in \mathcal{D}$ and is additionally strictly proper if equality only occurs when $p = d$ (our point prediction is exactly the value we're trying to predict). Metrics such as the pointwise MSE seem to have the latter property. – QMath Jul 01 '23 at 04:47
  • @Dave, I got an answer; check it out. – Richard Hardy Aug 05 '23 at 14:18
  • @QMath, thank you, this is helpful! – Richard Hardy Aug 05 '23 at 14:19

1 Answers1

4

As mentioned by QMath, the term (strictly) consistent scoring function is used in the statistical literature to describe functions which evaluate point predictions based on the same principles as (strictly) proper scoring rules.

The (slightly simplified) definition uses a set $A$ of possible point forecasts, a set $O$ of observations, a set $\mathcal{F}$ of distributions, and a statistical property/functional $T: \mathcal{F} \to A$. Examples of functionals are the mean or the median, as already mentioned in the question. A scoring function is a function $S : A \times O \to \mathbb{R}$. It is consistent for $T$ if $$ \mathbb{E}_{Y\sim F} \, S(T(F), Y) \le \mathbb{E}_{Y\sim F} \, S(x, Y) $$ for all $x \in A$ and $F \in \mathcal{F}$. It is strictly consistent if equality holds if and only if $x=T(F)$. The fact that $T(F)$ is a minimizer is of course simply convention.

The idea behind this is (analogous to proper scoring rules) that the point forecast $T(F)$ resulting from the true underlying distribution of $Y$ will receive the lowest score/loss on average. Similar to their proper scoring rule counterpart, they enable forecast rankings and can be used to do regression or train ML models, for a desired statistical property. The most popular examples are squared error (for the mean) and absolute error (for the median).

Two additional points relating to the discussion below the question:

  1. Proper scoring rules are not restricted to density forecasts. They assess probabilistic predictions. Depending on how you deal with your distributions, you can use different scoring rules. For instance, you can define them for the CDF or even the characteristic function.
  2. Tim stated above that "with point predictions, you are directly predicting the values". The theory behind scoring functions sees this slightly different. There we have some distribution of the target in mind and for some reason, we have to reduce this to a single point, e.g. the mean. So if a value $x$ is reported, this is not a statement, that the forecaster believes, that the observation will be $x$. It is a piece of statistical information. This becomes obvious when dealing with discretely distributed data: A forecaster can report $x=3.5$ here since they want to report the mean, but for a forecaster who wants to predict the observation, a non-integer forecast makes no sense at all.

For more details, see Making and evaluating point forecasts especially Section 3.2. (The definition there is for set-valued forecasts, so it is more general, but less intuitive, I think)

  • This is very helpful. I agree with point 1; I meant it your way but was sloppy with my phrasing. I used density as I do not find the term probabilistic completely adequate here, but perhaps that is because I am not a native speaker. Also, you could edit the post to specify that it is specifically section 3.2 of Making and evaluating point forecasts that addresses my question? That may help the other readers. Also, did you mean $O$ instead of $0$ in $S: A \times 0 \to \mathbb{R}$? – Richard Hardy Aug 05 '23 at 14:20
  • @RichardHardy Glad to hear. The zero is a typo of course. – picky_porpoise Aug 06 '23 at 12:33