10

I am reviewing textbooks for our new undergraduate course in Bayesian Statistical Methods. In chapter 7 of Ben Lambert's book, A Student's Guide to Bayesian Statistics, he states

Because of the two sources of uncertainty included in our model – the parameter uncertainty and sampling variability – the uncertainty of the Bayesian predictive distribution is typically greater than the Frequentist equivalent.  This is because the Frequentist approach to forecasting typically makes predictions based on a point estimate of a parameter (typically the maximum likelihood value).  By ignoring any uncertainty in the parameter’s value, the Frequentist approach produces predictive intervals that are overly confident.

[emphasis mine]

Then in the chapter summary, it is restated:

By including our epistemic uncertainty in parameter values as part of a forecast, this Bayesian approach provides a better quantification of uncertainty than the equivalent Frequentist methods.

Without going into more detail about how this book is written to suggest that everything Bayesian is good and correct (and thus all things Frequentist are bad and incorrect), I found myself needing to push back on this claim.

While I believe I know the answer to the titular question, perhaps my understanding is incorrect...and I wanted to put it here for feedback from this community.

As an example of how I believe this assertion in the book to be incorrect, I will use an example from multiple regression. If you wish to estimate a value from a model, you have the following $$\hat{y}_o = \beta_0 + \beta x_o$$ (where $\beta$ and $x_o$ may be vectors). If you wish to use this estimate as the conditional mean given the values of $X=x_o$, then $\hat{y}_o$ would be the point estimate, and the confidence interval for the conditional mean would be $$\hat{y}_o + t_\text{c.v.} \cdot \hat{\sigma} \sqrt{x_o^T (X·X^T)^{-1} x_o}$$ (for the appropriate critical value for the context and desired confidence level).

However, if you want the prediction (confidence) interval, you would use the following: $$\hat{y}_o + t_\text{c.v.} \cdot \hat{\sigma} \sqrt{1 + x_o^T (X·X^T)^{-1} x_o}$$ ...and unless I am wrong, the radicand captures the "epistemic uncertainty" that that author claims is not present. The 1 models the variability from the distribution and the second term models the variability from the parameter estimate (as it is not assumed to be a fixed value for prediction interval estimation).

Again, if I am incorrect or there are other contexts where this critique of the Frequentist approach is indeed valid...please share.

S. Catterall
  • 3,927
Gregg H
  • 5,474
  • 3
    Would love to see an example that showed in what sense the text author believes frequentist prediction intervals are overly confident. – Graham Bornholt Jun 30 '23 at 07:20
  • @Durden I'm sorry, I don't see how your comment applies to this query. Regardless of where the uncertainty in $\beta$ comes from (and by extension, the uncertainty in $\hat{y}_o$), the suggestion is that the Frequentist approach does NOT model this uncertainty. – Gregg H Jun 30 '23 at 12:47
  • @GreggH It models only one source of uncertainty (i.e., sampling variability), exactly as the Lambert quote claims. – Durden Jun 30 '23 at 13:14
  • @Durden The CI might, but the PI models both. – Gregg H Jun 30 '23 at 13:52
  • 4
    Many of the "equivalent frequentist methods" for standard statistical procedures have been shown to be mathematically the same as adopting certain Bayes priors. (This is a standard way to prove admissibility.) Usually those priors are quite diffuse, sometimes referred to as "uninformative." The quotations, then, are not just misleading: they are even mathematically incorrect. – whuber Jun 30 '23 at 16:08
  • @GreggH as derived here the variance of the posterior predictive is the sum of the posterior variance and the sampling variance of $y_{0}$. I presume if one interprets $\operatorname{Var}(x_{0}^T \hat{\beta}) = \sigma^{2} x_{0}^T (X^T X)^{-1} x_{0}$ as the uncertainty in $\beta$ (which one shouldn't), and chooses a prior that produces this posterior (which one could), then the variance of posterior predictive and that of frequentist prediction error coincide. – Durden Jun 30 '23 at 22:19
  • I'm sorry...but I do not see how the link you've provided here supports your argument, and the comment below by @Glen_b seems to contradict your assertion. – Gregg H Jun 30 '23 at 23:25
  • $\sigma^{2} x_{0}^T(X^T X)^{-1} x_0$ is the part of your radicand above that stems from the sampling distribution of $\hat{\beta}$. As I've wrote, if you had a posterior whose variance equals this expression (i.e., "probability-matching"), both Bayesian and frequentist interval estimates would agree. But again, this requires the improper conflation of CIs and posterior probability. In the end, I guess, Lambert is wrong in saying that frequentist intervals are "overly confident," because they are wider than Bayesian ones with strong priors. – Durden Jun 30 '23 at 23:46
  • Bayesian methods may well show less uncertainty than frequentist methods in case the data are in line with an informative prior. – Christian Hennig Jul 01 '23 at 11:01
  • 3
    By the way, although I'm not against Bayesian methodology or philosophy, the question shows one of many examples for annoying "Bayesian propaganda" that doesn't just properly explain the workings and also potential pitfalls of Bayesian analysis but spends much space and energy bashing and ridiculing some caricature of frequentist statistics in order to state that the Bayesian approach is superior and frequentists are idiots. My dear Bayesians, this kind of communication may well backfire and put off people who'd otherwise have an open mind for what you have on offer! – Christian Hennig Jul 01 '23 at 11:04
  • There often seems to be a whiff of desperation in some Bayesian attempts to sell their approach. – Graham Bornholt Jul 01 '23 at 11:52
  • 3
    I'm not sure what's so "desperate" in Lambert's description. Regardless, I think it can't be emphasized enough to refrain from the pseudo-Bayesian interpretation of confidence intervals as "capturing the epistemic uncertainty" about $\beta$. If that's what you want your interval estimate to mean, you have to start with a prior and use Bayes' rule at some point. – Durden Jul 01 '23 at 19:43
  • @Durden desperate in the sense of creating a frequentist straw man who interprets a point estimate as if it were an interval. Also, what pseudo-Bayesian interpretation of CIs are you referring to? – Graham Bornholt Jul 01 '23 at 20:21
  • 2
    One may call the notion (expressed in the original question) of confidence intervals "capturing the epistemic uncertainty" about $\beta$ as the misinterpretation of CIs as Bayesian credible intervals. The many older posts linked above explain at length why this notion is misguided. – Durden Jul 01 '23 at 22:44

2 Answers2

18

Does the Frequentist approach to forecasting ignore uncertainty in the parameter's value?

No! Or rather, it shouldn't (and normally doesn't), but of course an individual or some particular methodology might leave it out in some situation or other, either deliberately - e.g. because it's known to be so small as to not matter because sample size was truly huge; where bias is a much bigger concern than parameter uncertainty - or in ignorance (either of the need for including it, or of how to do so).

[Off the top of my head the only time I've noticed it happen was when someone omitted parameter uncertainty with a forecast from a GLM, out of one of those forms of ignorance.]

It's always important to be clear about what you're predicting.

A CI for a conditional mean would consider uncertainty in the parameter value, while a PI would consider both that parameter uncertainty and observation noise. I have a couple of times seen people forget the observation noise (i.e. confusing a CI for a PI) rather than parameter uncertainty.

You use a regression example, for which the prediction interval (clearly) has two terms in the variance, and it's easy to see that they account for both the observation noise and parameter uncertainty.

I haven't read Lambert's book, but from the quote he appears to misrepresent the normal situation in frequentist forecasting.

Glen_b
  • 282,281
3

I would say that the Bayesian approach forces the practitioner to take the uncertainty in the parameters into account. The posterior predictive distribution automatically embeds both observation noise and parameter uncertainty. For conjugate distributions the posterior predictive can be found in closed form. In the other cases it is well defined theoretically but hard to find/compute. A large number of approaches (MCMC, HMC, etc), tools and probabilistic programming languages (STAN, pymc, pyro, etc) have been developed specifically to address this sort of questions.

In the frequentist approach the are ways to account for the uncertainty in the parameters, as in the linear regression example in the OP. However there are two issues. One is that they are only available in closed form for a limited number of cases (like linear regression) but not for most cases (even simple deviations from LR like with Huber loss or lasso regularisation). The second issue (probably related to the first one) is that in practice you'll find that in many cases (I would dare to say in most cases) practitioners actually make the mistake of fitting some parameters to some data (typically using maximum likelihood) and then use the fitted model (with point estimate of the parameters) to make predictions on new data. There are methods to make predictions with an associated interval (e.g. with "conformal predictions" techniques) but they are often add-ons, rather than part of the model like in the Bayesian approach. On the other hand, since they are less reliant on assumptions than Bayesian techniques, they may be more robust and accurate in practice.

In other words my impression is that it is virtually impossible to "forget" to account for parameters uncertainty in Bayesian statistics while it's possible (and often done "in the field") in frequentist statistics.

Luca Citi
  • 1,336
  • 3
    Practitioners may also "estimate the prior from the data" when doing Bayesian analyses and in this way underestimate the Bayesian uncertainty. – Christian Hennig Jul 01 '23 at 11:00
  • @ChristianHennig Good point! Btw, I'm not implying the Bayesian approach is error-proof. I was just focusing on the specific issue of whether the parameters uncertainty is accounted for (in a natural way). Actually, if I had to build a device making important decisions and badly needed prediction accuracy estimates, I would not blindly trust a Bayesian approach (heavily reliant on making the correct assumptions) and would probably go for the frequentist method called "conformal predictions" (which is almost assumption-free) – Luca Citi Jul 01 '23 at 11:23