Estimate quantile of value in a vector

Question

I have a set of real numbers. I need to estimate the quantile of a new number. Is there any clean way to do this in R? in general?

I hope this is not ultra-trivial ;-)

Much appreciated for your response.

PK

score 56 · Accepted Answer · answered Sep 05 '14 at 22:47

56

As whuber pointed out, you can use ecdf, which takes a vector and returns a function for getting the percentile of a value.

> percentile <- ecdf(1:10)
> percentile(8)
[1] 0.8

answered Sep 05 '14 at 22:47

cwarden

768

3

For expediency, ecdf(1:10)(8) also works. – Jeffrey Girard Jun 08 '20 at 17:44

zerweck · Answer 2 · 2021-06-08T07:52:19.413

17

To expand on what whuber and cwarden stated, sometimes you want to use a function in a "classical" R way (for example, to use inside a magrittr pipe). Then you could write it yourself using ecdf():

ecdf_fun <- function(x,perc) ecdf(x)(perc)
ecdf_fun(1:10,8)
>[1] 0.8

edited Jun 08 '21 at 07:52

answered Mar 29 '16 at 14:19

zerweck

301

Ben · Answer 3 · 2022-08-19T03:40:49.437

The other answers here do a great job of explaining computation of sample quantiles from an empirical CDF. Here I will show an alternative to using the sample quantiles, which is to use a kernel density estimator for the distribution and then compute the relevant quantile of a given probability value in the kernel density estimate of the distribution.

This can easily be done using the KDE function in the utilities package. This function produces a kernel density estimator, with corresponding probability functions that can be called directly. First we generate the KDE and load its probability functins to the global environment.

#Generate mock data
set.seed(1)
DATA <- rnorm(30)
#Generate KDE
MY_KDE <- utilities::KDE(DATA, to.environment = TRUE)
plot(MY_KDE)
MY_KDE
Kernel Density Estimator (KDE)
Computed from 30 data points in the input 'DATA'
  Estimated bandwidth = 0.389054

  Input degrees-of-freedom = Inf
Probability functions for the KDE are the following:
  Density function:                   dkde * 
  Distribution function:              pkde * 
  Quantile function:                  qkde * 
  Random generation function:         rkde * 



This function is presently loaded in the global environment

Once the KDE has been generated, you can call the quantile function using any input probabilities you want. (This is the function qkde which is part of the produced KDE object; in the code above we have loaded the function to the global environment so it can be called directly.) In the present case we are using a KDE with a normal kernel, so the quantiles at the end-points are negative and positive infinity.

#Compute the quantile for given set of input probabilities
PROBS <- 0:20/20
qkde(PROBS)
[1]        -Inf -1.91000509 -1.28454267 -0.90720648 -0.66908812
 [6] -0.48063573 -0.31756724 -0.17102207 -0.03661007  0.08855345
[11]  0.20675802  0.32005203  0.43046992  0.54023777  0.65204011
[16]  0.76947215  0.89796032  1.04690298  1.23554909  1.51450168
[21]         Inf

Michael Garber · Answer 4 · 2022-08-19T01:59:27.393

If using dplyr, cume_dist() returns the percentile of a value.

https://dplyr.tidyverse.org/reference/ranking.html.

A plus of cume_dist() is it can be naturally used in a pipe (%>%):

library(tidyverse)
df=1:10 %>% 
  as_tibble() %>% 
  mutate( 
    x = rnorm(n=n(), mean=50, sd=10),
    x_percentile = cume_dist(x))
> df
A tibble: 10 × 3
value     x x_percentile
   <int> <dbl>        <dbl>
 1     1  43.7          0.3
 2     2  44.8          0.5
 3     3  41.8          0.2
 4     4  59.6          0.9
 5     5  54.3          0.8
 6     6  44.7          0.4
 7     7  37.4          0.1
 8     8  52.1          0.6
 9     9  61.2          1

10    10  52.3          0.7
> 
```

Estimate quantile of value in a vector

4 Answers4

A tibble: 10 × 3