How can I add new variable into data frame which will be percentile rank of one of the variables? I can do this in Excel easily, but I really want to do that in R.
Thanks
How can I add new variable into data frame which will be percentile rank of one of the variables? I can do this in Excel easily, but I really want to do that in R.
Thanks
Given a vector of raw data values, a simple function might look like
perc.rank <- function(x, xo) length(x[x <= xo])/length(x)*100
where x0 is the value for which we want the percentile rank, given the vector x, as suggested on R-bloggers.
However, it might easily be vectorized as
perc.rank <- function(x) trunc(rank(x))/length(x)
which has the advantage of not having to pass each value. So, here is an example of use:
my.df <- data.frame(x=rnorm(200))
my.df <- within(my.df, xr <- perc.rank(x))
If your original data.frame is called dfr and the variable of interest is called myvar, you can use dfr$myrank<-rank(dfr$myvar) for normal ranks, or dfr$myrank<-rank(dfr$myvar)/length(myvar) for percentile ranks.
Oh well. If you really want it the Excel way (may not be the simplest solution, but I had some fun using new (to me) functions and avoiding loops):
percentilerank<-function(x){
rx<-rle(sort(x))
smaller<-cumsum(c(0, rx$lengths))[seq(length(rx$lengths))]
larger<-rev(cumsum(c(0, rev(rx$lengths))))[-1]
rxpr<-smaller/(smaller+larger)
rxpr[match(x, rx$values)]
}
so now you can use dfr$myrank<-percentilerank(dfr$myvar)
HTH.
length < length(dfr$myvar)".
– gung - Reinstate Monica
Aug 26 '13 at 16:58
A problem with the presented answer is that it will not work properly, when you have NAs.
In this case, another possibility (inspired by the function from chl♦) is:
perc.rank <- function(x) trunc(rank(x,na.last = NA))/sum(!is.na(x))
quant <- function (x, p.ile) {
x[which.min(x = abs(perc.rank(x-(p.ile/100))))]
}
Here, x is the vector of values, and p.ile is the percentile by rank. 2.5 percentile by rank of (arbitrary) coef.mat may be calculated by:
quant(coef.mat[,3], 2.5)
[1] 0.00025
or as a single function:
quant <- function (x, p.ile) {
perc.rank <- trunc(rank(x,na.last = NA))/sum(!is.na(x))
x = na.omit(x)
x[which.min(x = abs(perc.rank(x-(p.ile/100))))]
}
percentrank-function, which is good (+1) since the latter gives "strange" results (see my comparison). 2. I wouldn't name the data framedf, becausedfis an R function (the density of the F distribution, see?df).CTTpackage a while ago. I didn't check against Excel because I don't have/use it. About (2) I seem to always forget about this! Let's go withmy.*(Perl way) :-) – chl Jun 15 '11 at 11:21truncrequired? It seems rank will always return an integer anyway. – Tyler Rinker May 10 '18 at 18:38rank()defaults to taking the average of the tied values (cf.ties.method = c("average",...)). – chl May 11 '18 at 13:15x = x[!is.na(x)]– Antoine Mar 21 '21 at 16:55