Given N sampled values, what does the "p-th quantile of the sampled values" mean?
1 Answers
In theory (with $0 \lt p \lt 1$) it means the point a fraction $p$ up the cumulative distribution. In practice there are various definitions used, particularly in statistical computing. For example in R there are nine different definitions, the first three for a discrete interpretation and the rest for a variety of continuous interpolations.
Here is an example: if your sample is $\{400, 1, 1000, 40\}$, and you are looking for the $0.6$ quantile ($60$th centile) then the different calculation methods give
> x <- numeric()
> for (t in 1:9) { x[t] <- quantile(c(400, 1, 1000, 40), probs=0.6, type = t ) }
> x
60%
400 400 40 184 364 400 328 376 373
My personal view is that the correct figure is $400$ since $$Pr(X<400) = 0.5 < 0.6 \text{ and } Pr(X>400) = 0.25 < 1-0.6.$$ This comes from treating the sample as the population, and if the empirical CDF is drawn it will be a sequence of steps. There are opposing arguments for interpolating so the empirical CDF is continuous, as being likely to be a better or more useful approximation to the population, and the method of interpolation will affect the result.
- 39,459
-
5The Wikipedia page at http://en.wikipedia.org/wiki/Quantile#Estimating_the_quantiles_of_a_population provides a nice table of the various definitions. – Rob Hyndman May 13 '11 at 00:52