Bootstrapping with quantiles of data instead of SD*z?

Question

I have recently been bootstrapping the confidence intervals of a neural network model estimated to data.

I execute the following psudo-code, which seems similar to previous bootstraps I have done:

Take all observations of data $x_i$, estimate $\hat{y} = \hat{\theta}(x_i)$
Resample $N$ observations from the set of $x_i$ with replacement. Estimate $\tilde{y} = \tilde{\theta}(x_i)$. Repeat until we have $M$ different versions of $\tilde{y}_m$
Calculate the standard deviation of everything in $\tilde{y}_M$ as $\sigma_{\tilde{y}}$ and then the confidence interval of $\hat{y}$ is $\hat{y} \pm z * \sigma_{\tilde{y}}$ where z is chosen to be an appropriate value from a standard normal distribution, usually 1.96 for 95% CI.

See here for more reading on bootstrap intervals.

At the same time, I see other approaches that use the quantiles of $\tilde{y}_M$ in order to construct intervals of some sort, such as here and here. They do this instead of the $\hat{y} \pm z * \sigma_{\tilde{y}}$ I have come to expect. The intuition for such an argument is very strong and seems valid - but the approach is unfamiliar. What's going on with the use of quantiles? Is something else being calculated (prediction intervals, etc.) that are similar in flavor but are not the same?

Davison and Hinkley's book Bootstrap Methods and their Application presents a number of approaches to bootstrapping (not exhaustive but both parametric and nonparametric cases are discussed) and it does discuss bootstrap prediction intervals and nonparametric regression models; this might be helpful as a reference, for all that the various pieces are not all in the same places, since the relevant concepts should carry across. — Glen_b, May 12 '22 at 01:50

IrishStat · Answer 1 · 2022-05-11T20:19:19.357

7

A possible alternative is to identify and form an efficient model generating a set of iid residuals. Generate a forecast using the model and then use the distribution of residuals as the basis of the montecarlo creating limits that are not presumptively based upon a distributional assumption. Use this empirical distribution to offset the expected value. This method allows the identification and incorporation of possible unusual pulses and the option to incorporate (allow for) predicted pulses into the forecast distribution. This is the suggested approach of Prpf. Sam Savage as an extension to his book "The Flaw of Averages".

edited May 11 '22 at 20:19

answered May 11 '22 at 15:12

IrishStat

29,661

I'm a little confused - ARMAX modeling? I am asking about when it is appropriate to use $\pm z*SD$ vs quantiles to estimate confidence intervals. – RegressForward May 11 '22 at 15:17
1

OP: @IrishStat is a time series guru passionate about forecasting in time and is guessing that that is what you are doing. – Nick Cox May 11 '22 at 15:19
i pulled the refernce to ARMAX – IrishStat May 11 '22 at 15:19
Past and future pulses remain.... – Nick Cox May 11 '22 at 15:20
Thanls Nick although in my view cross-sectional data is just a specific case of time series data. . Past and future one-time anomalies could remain whether or not the data is time series – IrishStat May 11 '22 at 15:21

Bootstrapping with quantiles of data instead of SD*z?

1 Answers1