I have a set of real numbers. I need to estimate the quantile of a new number. Is there any clean way to do this in R? in general?
I hope this is not ultra-trivial ;-)
Much appreciated for your response.
PK
I have a set of real numbers. I need to estimate the quantile of a new number. Is there any clean way to do this in R? in general?
I hope this is not ultra-trivial ;-)
Much appreciated for your response.
PK
As whuber pointed out, you can use ecdf, which takes a vector and returns a function for getting the percentile of a value.
> percentile <- ecdf(1:10)
> percentile(8)
[1] 0.8
To expand on what whuber and cwarden stated, sometimes you want to use a function in a "classical" R way (for example, to use inside a magrittr pipe). Then you could write it yourself using ecdf():
ecdf_fun <- function(x,perc) ecdf(x)(perc)
ecdf_fun(1:10,8)
>[1] 0.8
The other answers here do a great job of explaining computation of sample quantiles from an empirical CDF. Here I will show an alternative to using the sample quantiles, which is to use a kernel density estimator for the distribution and then compute the relevant quantile of a given probability value in the kernel density estimate of the distribution.
This can easily be done using the KDE function in the utilities package. This function produces a kernel density estimator, with corresponding probability functions that can be called directly. First we generate the KDE and load its probability functins to the global environment.
#Generate mock data
set.seed(1)
DATA <- rnorm(30)
#Generate KDE
MY_KDE <- utilities::KDE(DATA, to.environment = TRUE)
plot(MY_KDE)
MY_KDE
Kernel Density Estimator (KDE)
Computed from 30 data points in the input 'DATA'
Estimated bandwidth = 0.389054
Input degrees-of-freedom = Inf
Probability functions for the KDE are the following:
Density function: dkde *
Distribution function: pkde *
Quantile function: qkde *
Random generation function: rkde *
- This function is presently loaded in the global environment
Once the KDE has been generated, you can call the quantile function using any input probabilities you want. (This is the function qkde which is part of the produced KDE object; in the code above we have loaded the function to the global environment so it can be called directly.) In the present case we are using a KDE with a normal kernel, so the quantiles at the end-points are negative and positive infinity.
#Compute the quantile for given set of input probabilities
PROBS <- 0:20/20
qkde(PROBS)
[1] -Inf -1.91000509 -1.28454267 -0.90720648 -0.66908812
[6] -0.48063573 -0.31756724 -0.17102207 -0.03661007 0.08855345
[11] 0.20675802 0.32005203 0.43046992 0.54023777 0.65204011
[16] 0.76947215 0.89796032 1.04690298 1.23554909 1.51450168
[21] Inf
If using dplyr, cume_dist() returns the percentile of a value.
https://dplyr.tidyverse.org/reference/ranking.html.
A plus of cume_dist() is it can be naturally used in a pipe (%>%):
library(tidyverse)
df=1:10 %>%
as_tibble() %>%
mutate(
x = rnorm(n=n(), mean=50, sd=10),
x_percentile = cume_dist(x))
> df
A tibble: 10 × 3
value x x_percentile
<int> <dbl> <dbl>
1 1 43.7 0.3
2 2 44.8 0.5
3 3 41.8 0.2
4 4 59.6 0.9
5 5 54.3 0.8
6 6 44.7 0.4
7 7 37.4 0.1
8 8 52.1 0.6
9 9 61.2 1
10 10 52.3 0.7
>
```
ecdf(1:10)(8)also works. – Jeffrey Girard Jun 08 '20 at 17:44