11

Given N sampled values, what does the "p-th quantile of the sampled values" mean?

bit-question
  • 2,817

1 Answers1

11

In theory (with $0 \lt p \lt 1$) it means the point a fraction $p$ up the cumulative distribution. In practice there are various definitions used, particularly in statistical computing. For example in R there are nine different definitions, the first three for a discrete interpretation and the rest for a variety of continuous interpolations.

Here is an example: if your sample is $\{400, 1, 1000, 40\}$, and you are looking for the $0.6$ quantile ($60$th centile) then the different calculation methods give

> x <- numeric()
> for (t in 1:9) { x[t] <- quantile(c(400, 1, 1000, 40), probs=0.6, type = t ) }
> x
60%                                 
400 400  40 184 364 400 328 376 373 

My personal view is that the correct figure is $400$ since $$Pr(X<400) = 0.5 < 0.6 \text{ and } Pr(X>400) = 0.25 < 1-0.6.$$ This comes from treating the sample as the population, and if the empirical CDF is drawn it will be a sequence of steps. There are opposing arguments for interpolating so the empirical CDF is continuous, as being likely to be a better or more useful approximation to the population, and the method of interpolation will affect the result.

Henry
  • 39,459
  • 5
    The Wikipedia page at http://en.wikipedia.org/wiki/Quantile#Estimating_the_quantiles_of_a_population provides a nice table of the various definitions. – Rob Hyndman May 13 '11 at 00:52