Confidence interval for median

Question

I have to find a 95% C.I. on the median and other percentiles. I don't know how to approach this. I mainly use R as a programming tool.

score 32 · Accepted Answer · edited Apr 13 '17 at 12:44

32

Here is an illustration on a classical R dataset:

> x       = faithful$waiting
> bootmed = apply(matrix(sample(x, rep=TRUE, 10^4*length(x)), nrow=10^4), 1, median)
> quantile(bootmed, c(.025, 0.975))
2.5% 97.5% 
 73.5    77

which gives a (73.5, 77) confidence interval on the median.

(Note: Corrected version, thanks to John. I used $10^3$ in the nrow earlier, which led to the confusion!)

edited Apr 13 '17 at 12:44

Community

1

answered Jan 15 '12 at 13:37

Xi'an

105,342

9

Seems suspiciously narrow to me. Using functions from library(boot) appears to confirm this:

boot.ci(boot(x,function(x,i) median(x[i]), R=1000))

Intervals :
Level Normal Basic
95% (74.42, 78.22 ) (75.00, 78.49 )

Level Percentile BCa
95% (73.51, 77.00 ) (73.00, 77.00 )
– onestop Jan 15 '12 at 15:13
2

you're welcome Xi'an... As an aside, I always prefer to set the original N value in the matrix because that's a constant across various bootstrap sizes I might make. So, I would typically have said ncol = length(x). I find there's less chance for error that way. – John Jan 15 '12 at 23:44
7

This is just an inefficient way to compute the binomial quantiles as in onestop's answer. – whuber Feb 10 '15 at 16:11

score 31 · Answer 2 · answered Jan 15 '12 at 15:21

31

Another approach is based on quantiles of the binomial distribution.
e.g.:

> x=faithful$waiting
> sort(x)[qbinom(c(.025,.975), length(x), 0.5)]
[1] 73 77

answered Jan 15 '12 at 15:21

onestop

17,737
2
62
89

4

I like the simplicity of this one... Results are close to the bootstrap method. – Dominic Comtois Jan 16 '12 at 20:13
1

This is obviously much more efficient than bootstrapping for the continuous case, but one disadvantage is that it does not account for tied ranks. Do you happen to know of a workaround for this? – ali_m Jun 29 '15 at 10:50
Can you just use quantile() instead of sort[]? – Simon Woodward Jun 10 '21 at 01:29

score 15 · Answer 3 · answered Jan 15 '12 at 05:44

15

Check out bootstrap resampling. Search R help for the boot function. Depending on your data with resampling you can estimate confidence intervals for just about anything.

answered Jan 15 '12 at 05:44

tharen

461

Agree. This is the best approach. Underused in the biomedical sciences, in my opinion. – pmgjones Jan 15 '12 at 13:41
10

Consider looking into the smoothed bootstrap for estimating population quantiles as the conventional boostrap seems to have problems in that case - references can be found in this pdf. If you were just interested in the theoretical Median, the Hodges-Lehman estimator can be used - as provided by, e.g., R's wilcox.test(..., conf.int=TRUE) function. – caracal Jan 15 '12 at 19:50

score 4 · Answer 4 · edited Aug 26 '15 at 03:16

4

And there are other approaches: One is based on Wilcoxon Rank Sum test applied for one sample with continuity correction. In R this can be supplied as:

wilcox.test(x,conf.level=0.95,alternative="two.sided",correct=TRUE)

And there is the David Olive's CI for median discussed here:

CI for Median

edited Aug 26 '15 at 03:16

James Stanley

2,736

answered Dec 27 '13 at 16:14

Germaniawerks

1,077

score 2 · Answer 5 · answered Aug 22 '17 at 21:56

The result based on the qbinom approach isn't correct for small samples. Suppose that x has 10 components. Then qbinom(c(.025,.975),10,.5) gives 2 and 8. The resulting interval doesn't treat order statistics at the lower tail symmetrically with those from the upper tail; you should get either 2 and 9, or 3 and 8. The right answer is 2 and 9. You can check against proc univariate in SAS. Catch here is you need no more than .025 probability below and above; the lower quantile doesn't do this, as it gives at least .025 at or below. You get saved on the bottom because the count that should be 1 should get mapped to the second order statistic, counting 0, and so the "off by one" cancels. This fortuitous canceling does not happen on top, and so you get the wrong answer here. The code sort(x)[qbinom(c(.025,.975),length(x),.5)+c(0,1)] almost works, and .5 can be replaced by other quantile values to get confidence intervals for other quantiles, but it won't be right when there exists a such that P[X<=a]=.025. See, for ex, Higgins, Nonparametric Statisitcs.

Confidence interval for median

5 Answers5

Linked

Related