Computing percentile rank in R

Question

How can I add new variable into data frame which will be percentile rank of one of the variables? I can do this in Excel easily, but I really want to do that in R.

Thanks

chl · Accepted Answer · 2011-06-15T11:13:21.743

36

Given a vector of raw data values, a simple function might look like

perc.rank <- function(x, xo)  length(x[x <= xo])/length(x)*100

where x0 is the value for which we want the percentile rank, given the vector x, as suggested on R-bloggers.

However, it might easily be vectorized as

perc.rank <- function(x) trunc(rank(x))/length(x)

which has the advantage of not having to pass each value. So, here is an example of use:

my.df <- data.frame(x=rnorm(200))
my.df <- within(my.df, xr <- perc.rank(x))

edited Jun 15 '11 at 11:13

answered Jun 15 '11 at 10:30

chl

53,725

3
Your function does not mimic Excel's percentrank-function, which is good (+1) since the latter gives "strange" results (see my comparison). 2. I wouldn't name the data frame df, because df is an R function (the density of the F distribution, see ?df).

Bernd Weiss

Jun 15 '11 at 11:04

1

@Bernd Thanks. (1) There are some built-in functions for computing PR in various psychometrics packages. I think I grabbed this one from the CTT package a while ago. I didn't check against Excel because I don't have/use it. About (2) I seem to always forget about this! Let's go with my.* (Perl way) :-) – chl Jun 15 '11 at 11:21

@chl why is the trunc required? It seems rank will always return an integer anyway. – Tyler Rinker May 10 '18 at 18:38

1

@Tyler Nope. In case there are ties, rank() defaults to taking the average of the tied values (cf. ties.method = c("average",...)). – chl May 11 '18 at 13:15

1

Beware that NA values should be removed! This can be done by adding x = x[!is.na(x)] – Antoine Mar 21 '21 at 16:55

Nick Sabbe · Answer 2 · 2011-06-15T11:50:44.800

9

If your original data.frame is called dfr and the variable of interest is called myvar, you can use dfr$myrank<-rank(dfr$myvar) for normal ranks, or dfr$myrank<-rank(dfr$myvar)/length(myvar) for percentile ranks.

Oh well. If you really want it the Excel way (may not be the simplest solution, but I had some fun using new (to me) functions and avoiding loops):

percentilerank<-function(x){
  rx<-rle(sort(x))
  smaller<-cumsum(c(0, rx$lengths))[seq(length(rx$lengths))]
  larger<-rev(cumsum(c(0, rev(rx$lengths))))[-1]
  rxpr<-smaller/(smaller+larger)
  rxpr[match(x, rx$values)]
}

so now you can use dfr$myrank<-percentilerank(dfr$myvar)

HTH.

edited Jun 15 '11 at 11:50

answered Jun 15 '11 at 10:06

Nick Sabbe

12,819
2
37
47

1 - (rank/size) gives you same as excel percentilerank – user333 Jun 15 '11 at 11:24
I got this from office.microsoft.com – Nick Sabbe Jun 15 '11 at 11:51
An anonymous (attempted) editor tried to add the following comment: "Nice function but sometimes, unfortunately, the RLE may return vector of length < length(dfr$myvar)". – gung - Reinstate Monica Aug 26 '13 at 16:58
Can you explain or link to the theory of this method? – mavavilj May 18 '21 at 18:47

Farshad · Answer 3 · 2016-01-13T01:09:37.740

A problem with the presented answer is that it will not work properly, when you have NAs.

In this case, another possibility (inspired by the function from chl♦) is:

perc.rank <- function(x) trunc(rank(x,na.last = NA))/sum(!is.na(x))
quant <- function (x, p.ile) {
      x[which.min(x = abs(perc.rank(x-(p.ile/100))))]
}

Here, x is the vector of values, and p.ile is the percentile by rank. 2.5 percentile by rank of (arbitrary) coef.mat may be calculated by:

quant(coef.mat[,3], 2.5)  
[1] 0.00025

or as a single function:

quant <- function (x, p.ile) {
   perc.rank <- trunc(rank(x,na.last = NA))/sum(!is.na(x))
   x = na.omit(x)
   x[which.min(x = abs(perc.rank(x-(p.ile/100))))]
}

Computing percentile rank in R

3 Answers3

Linked