7

Diebold "Forecasting in Economics, Business, Finance and Beyond" (v. 1 August 2017) section 10.1 lists absolute standards for point forecasts, with the first one being unbiasedness: Optimal forecasts are unbiased. There is a brief follow-up there, too: If the forecast is unbiased, then the forecast error has a zero mean.

I have a quibble with this. Let $Y$ be the random variable the realization of which we are trying to predict, and let $z$ be the forecast. If

  1. we know at the time of making the forecast that it will be judged by a quantile loss function (say, a $p$-level quantile)

and

  1. our goal is to minimize the expected loss,

we should target the relevant quantile of the distribution, $z=F_Y^{-1}(p)$, where $F_{Y}$ is the cumulative distribution function of $Y$. I would be willing to call $z=z^*:=F_Y^{-1}(p)$ an optimal forecast. However, unless the quantile coincides with the mean, $z$ will produce an error $e:=Y-z$ with $\mathbb{E}(e)$, breaking Diebold's optimality property.

Question: How could we reformulate the optimality property for it to make more sense? What if we defined unbiasedness w.r.t. the loss-minimizing target instead of the realization of $Y$ (so that in my example, an optimal forecast would have the expected value of $F_Y^{-1}(p)$ instead of $\mathbb{E}(Y)$ as per Diebold)? Or are there circumstances where such version of unbiasedness is not desirable?


Update: To be fair, Diebold writes the following in section 2.8 (on p. 38):

Quite generally under asymmetric $L(e)$ loss (e.g., linlin), optimal forecasts are biased, whereas the conditional mean forecast is unbiased. Bias is optimal under asymmetric loss because we can gain on average by pushing the forecasts in the direction such that we make relatively few errors of the more costly sign.

But he does not reiterate nor even hint at that in the beginning of chapter 10 (e.g. on p. 334-335) where he discusses forecast optimality – even though he does mention the loss function there:

This unforecastability principle is valid in great generality; it holds, for example, regardless of whether linear-projection optimality or conditional-mean optimality is of interest, regardless of whether the relevant loss function is quadratic, and regardless of whether the series being forecast is stationary.

(The first emphasis is the author's, the second is mine.)

Richard Hardy
  • 67,272
  • 1
    Maybe he discusses somewhere that the loss function is quadratic? Then, the mean would be the optimal forecast, which then leads to an unbiased forecast. – Christoph Hanck Feb 27 '24 at 13:15
  • @ChristophHanck, he does that in section 2.8 on p. 38: Quite generally under asymmetric L(e) loss (e.g., linlin), optimal forecasts are biased, whereas the conditional mean forecast is unbiased. Bias is optimal under asymmetric loss because we can gain on average by pushing the forecasts in the direction such that we make relatively few errors of the more costly sign. But he does not come back to it in chapter 10 (e.g. on p. 334-335) where he discusses forecast optimality – even though he does mention the loss function there once. Looks like an oversight to me. – Richard Hardy Feb 27 '24 at 13:20
  • 1
    Yes, it seems like clarification may have been useful there! – Christoph Hanck Feb 27 '24 at 13:23
  • @ChristophHanck, while I have your attention, would you mind taking a look at this? https://stats.stackexchange.com/questions/639814 – Richard Hardy Feb 27 '24 at 13:28
  • I suggest you read https://press.princeton.edu/books/hardcover/9780691140131/economic-forecasting. The treatment of optimal forecasts are really good here. – Cagdas Ozgenc Feb 27 '24 at 13:35
  • @CagdasOzgenc, thank you! I love that book and keep referring to some of the chapters (especially on forecast evaluation and comparison) again and again. – Richard Hardy Feb 27 '24 at 13:55
  • @ChristophHanck: I respectfully submit that I think your comment has matters backward. You start with an error measure and deduce the optimal point forecast under this metric. I would say that it should be the other way around: first decide which functional we want to elicit (e.g., the mean), then deduce which error measure is appropriate (here squared loss). See this paper, and yes, I bring that up regularly... – Stephan Kolassa Feb 27 '24 at 16:22
  • @StephanKolassa, is figuring out the evaluation loss function not the best way to derive the functional the forecasts should be targeting? I would rather ask a client what losses they are facing from different forecast errors than what functional they are targeting. – Richard Hardy Feb 27 '24 at 16:27
  • Interesting. I find it much easier to ask clients whether they are looking for expectations or quantiles, than to discuss their error metric with them. (To be honest, quite often I am presented with a preselected error metric, like the MAPE, and then have to gently point out how this probably rewards forecasts quite different from what they actually want.) Also, I have rarely seen anyone use a quantile loss in the first place, who might profit from discussing that a quantile forecast would be optimal under this loss - it's the other way around. – Stephan Kolassa Feb 27 '24 at 16:31
  • ... we may be using the term "loss" in two different senses. Either as an error metric ("quantile loss"), or as economic loss resulting from acting on a forecast. In the latter case, often enough the economic loss strongly depends on just how a forecast is turned into a business decision. – Stephan Kolassa Feb 27 '24 at 16:32
  • @StephanKolassa Regarding your last sentence, I’ve begun wondering lately how much supervised learning predictions can be rephrased in terms of something like reinforcement learning, where we learn the decisions to make under various conditions in order to have optimal outcomes. – Dave Feb 27 '24 at 16:35
  • 1
    @StephanKolassa, thanks for your interesting perspective. "My" view is, e.g., influenced by Bayesian perspectives along the lines of "if you have a quadratic loss function then your Bayes rule is the posterior mean". Or my more down-to-earth classroom example: I have an asymmetric loss function when cycling to the train station in the morning in that I'd rather arrive 5 minutes early than two minutes late. – Christoph Hanck Feb 27 '24 at 16:48
  • @StephanKolassa, by evaluation loss (as opposed to training loss), I mean the economic loss. I expect this would be easier to extract from a client than the relevant functional of the target distribution. (I suppose most of the clients have no idea what functional they should target. Most of them probably do not even understand the concepts of conditional mean/quantile.) But your point about converting forecasts to optimal decisions optimally makes my approach trickier than in the ideal case. A density forecast would be a universal solution, as it disentangles the forecast from the decision. – Richard Hardy Feb 27 '24 at 16:50

1 Answers1

3

Bias and unbiasedness are concepts that apply to estimates, or estimators, of an unknown population parameter. Thus, I would argue it makes no sense to discuss "the (un)biasedness of a point forecast" without - explicitly or implicitly - specifying which functional of the future distribution we want to forecast.

We may want to forecast conditional expectations, and then our point forecast may be biased or unbiased as an expectation forecast. This is Diebold's implicit assumption.

However, as you write, it is quite often the case that we don't want an expectation forecast, but a quantile forecast. (For instance, for setting target safety stocks, or to build dams that are not only high enough for average floods, but for long tail floods.) In this case, I would say it makes perfect sense to discuss whether a particular quantile forecast is biased or unbiased. In this latter case, of course an unbiased quantile forecast will usually not have an expected error of zero.

A related but separate problem is that of detecting biases in our point forecasts. Yes, we can use quantile losses to elicit quantile forecasts... but given a sequence of quantile forecasts and corresponding realizations, each separate forecast, or the entire algorithm for producing them, may or may not be biased. And this is very hard to detect.

Stephan Kolassa
  • 123,354
  • 2
    Regarding your last sentence, does it apply to forecasts targeting the conditional mean as well or only to ones targeting quantiles? If the latter, could you explain more how the two cases are different? Or is your point that we cannot / should not treat every forecast error as i.i.d., and then (in the extreme case of entirely unknown dependence) we have one observation per parameter, so we cannot do much at all? – Richard Hardy Feb 27 '24 at 12:00
  • On the one hand, I was referring to the fact that there are just far fewer observations in the tails. Even if your data are IID, if you have only 100 observations, it's "harder" to determine whether your 99% quantile forecast was unbiased or not, than to determine the same for an expectation forecast. (However we want to quantify "harder".) And of course your point about non-IID errors is also very important, and I would argue that this holds as much in non-time series situations; it's just that it's easier to see if we are looking at time series forecasts. – Stephan Kolassa Feb 27 '24 at 16:16
  • OK, that I get. Thank you! – Richard Hardy Feb 27 '24 at 16:28