0

Given an i.i.d. sample of 36 integers: [6, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]

  1. A bootstrap resampling procedure is performed to build 9,999 bootstraps of length 36
  2. Lower and upper quantiles at 0.025 and 0.975 are then computed for each bootstrap using numpy.quantile with method = "linear"
  3. The mean of the 9,999 lower and upper quantile values is taken and corrected for bias if necessary (not necessary in this case per Efron, Tibshirani (1993)) to give us a point estimate of the quantiles.

Once this procedure is performed the lower and upper quantile point estimates are calculated as 6.00 and 6.93 respectively. In this case 0.0% of values fall below the lower quantile (2.5%) point estimate and 88.8% of values fall below the upper quantile (97.5%) point estimate.

If the bootstrap is not appropriate for discrete distributions (I've seen disagreement on this), then what alternative method can be used?

Reference: Efron, B., & Tibshirani, R. J. (1993). Estimates of bias. An Introduction to the Bootstrap (pp. 124-130). Springer Science and Business Media Dordrecht. DOI 10.1007/978-1-4899-4541-9

  • 2
    How are you defining the quantiles? R has 9 different definitions, a particular issue when the values are discrete rather than continuous. Also, note that it's only when you get to 40 elements in your list that you can get reliably to both the 2.5th and 97.5th percentiles: 0.02540=1, 0.97540=39. – EdM Mar 11 '22 at 17:23
  • 2
    What property of the distribution are you bootstrapping?? Certainly not such extreme quantiles--there's scarcely any hope of estimating the most extreme $1/40$ of the values (2.5%) from a sample of just $36$ numbers! – whuber Mar 11 '22 at 17:29
  • @EdM I am using Python and numpy.quantile() with the default interpolation method of "linear". – Pierre Delecto Mar 11 '22 at 17:29
  • 1
    Does this answer your question? Using bootstrap to obtain sampling distribution of 1st-percentile The argument in the answer to that question is based on the 1st and 99th percentiles, but it applies similarly to trying to estimate the 2.5th and 97.5th percentiles from such a small data set, particularly as your values only take on one of 2 integer values. – EdM Mar 11 '22 at 18:17
  • @EdM This method meets the standards of my use case when the sample is of continuous values, despite the "extreme" quantiles of 0.025 and 0.975. I am investigating using a different quantile estimation method for low cardinality distributions using this paper: Sample Quantiles in Statistical Packages. – Pierre Delecto Mar 11 '22 at 18:17
  • 3
    The paper you cite provides the 9 definitions used in R. Other quantile estimation methods are unlikely to help with data like yours. With only 1/9 of the values being 7 instead of 6, effectively all estimates of the 2.5th percentile will necessarily be equal to 6: you would need at least 35 of the 36 values in a bootstrap sample to equal 7 to get any other value regardless of how you interpolate between the lowest and the next-lowest value of your 36 bootstrapped values. This is a well-known problem with bootstrapping near extremes of distributions, whether discrete or continuous. – EdM Mar 11 '22 at 18:33
  • 2
    BTW, you can compute the bootstrap without doing any resampling, because the bootstrap distribution is Binomial. Thus, you can quickly and easily compute any property of the bootstrap distribution you like. If you don't bootstrap (and bootstrapping really isn't effective here), you need to adopt a model that tells you something about the chances of other values besides just $6$ or $7$ occurring. – whuber Mar 11 '22 at 18:59
  • There are four 7's and thirty-two 6's in the sample. It may make sense to seek quantiles 0.1 and 0.9. Of the various 'types' of quantiles available in R, type=4 uses linear interpolation of the ECDF, and it gives $6$ for quantile 0.1 and $6.4$ for quantile 0.9. Because there are only 6's and 7's in the sample, one could argue by the binomial argument (or otherwise) that there is no point in re-sampling. [By a similar argument, quantiles 0.025 and 0.975 can give only results $6$ and $7.$ respectively, from quantile(x, c(.025,.975), type=4).] – BruceET Mar 12 '22 at 04:56

0 Answers0