I have to find a 95% C.I. on the median and other percentiles. I don't know how to approach this. I mainly use R as a programming tool.
5 Answers
Here is an illustration on a classical R dataset:
> x = faithful$waiting
> bootmed = apply(matrix(sample(x, rep=TRUE, 10^4*length(x)), nrow=10^4), 1, median)
> quantile(bootmed, c(.025, 0.975))
2.5% 97.5%
73.5 77
which gives a (73.5, 77) confidence interval on the median.
(Note: Corrected version, thanks to John. I used $10^3$ in the nrow earlier, which led to the confusion!)
Another approach is based on quantiles of the binomial distribution.
e.g.:
> x=faithful$waiting
> sort(x)[qbinom(c(.025,.975), length(x), 0.5)]
[1] 73 77
- 17,737
- 2
- 62
- 89
-
4I like the simplicity of this one... Results are close to the bootstrap method. – Dominic Comtois Jan 16 '12 at 20:13
-
1This is obviously much more efficient than bootstrapping for the continuous case, but one disadvantage is that it does not account for tied ranks. Do you happen to know of a workaround for this? – ali_m Jun 29 '15 at 10:50
-
Check out bootstrap resampling. Search R help for the boot function. Depending on your data with resampling you can estimate confidence intervals for just about anything.
- 461
-
Agree. This is the best approach. Underused in the biomedical sciences, in my opinion. – pmgjones Jan 15 '12 at 13:41
-
10Consider looking into the smoothed bootstrap for estimating population quantiles as the conventional boostrap seems to have problems in that case - references can be found in this pdf. If you were just interested in the theoretical Median, the Hodges-Lehman estimator can be used - as provided by, e.g., R's
wilcox.test(..., conf.int=TRUE)function. – caracal Jan 15 '12 at 19:50
And there are other approaches: One is based on Wilcoxon Rank Sum test applied for one sample with continuity correction. In R this can be supplied as:
wilcox.test(x,conf.level=0.95,alternative="two.sided",correct=TRUE)
And there is the David Olive's CI for median discussed here:
- 2,736
- 1,077
The result based on the qbinom approach isn't correct for small samples. Suppose that x has 10 components. Then qbinom(c(.025,.975),10,.5) gives 2 and 8. The resulting interval doesn't treat order statistics at the lower tail symmetrically with those from the upper tail; you should get either 2 and 9, or 3 and 8. The right answer is 2 and 9. You can check against proc univariate in SAS. Catch here is you need no more than .025 probability below and above; the lower quantile doesn't do this, as it gives at least .025 at or below. You get saved on the bottom because the count that should be 1 should get mapped to the second order statistic, counting 0, and so the "off by one" cancels. This fortuitous canceling does not happen on top, and so you get the wrong answer here. The code sort(x)[qbinom(c(.025,.975),length(x),.5)+c(0,1)] almost works, and .5 can be replaced by other quantile values to get confidence intervals for other quantiles, but it won't be right when there exists a such that P[X<=a]=.025. See, for ex, Higgins, Nonparametric Statisitcs.
- 21
library(boot)appears to confirm this:Intervals :
Level Normal Basic
95% (74.42, 78.22 ) (75.00, 78.49 )
Level Percentile BCa
– onestop Jan 15 '12 at 15:1395% (73.51, 77.00 ) (73.00, 77.00 )