23

How can I add new variable into data frame which will be percentile rank of one of the variables? I can do this in Excel easily, but I really want to do that in R.

Thanks

ttnphns
  • 57,480
  • 49
  • 284
  • 501
user333
  • 7,211

3 Answers3

36

Given a vector of raw data values, a simple function might look like

perc.rank <- function(x, xo)  length(x[x <= xo])/length(x)*100

where x0 is the value for which we want the percentile rank, given the vector x, as suggested on R-bloggers.

However, it might easily be vectorized as

perc.rank <- function(x) trunc(rank(x))/length(x)

which has the advantage of not having to pass each value. So, here is an example of use:

my.df <- data.frame(x=rnorm(200))
my.df <- within(my.df, xr <- perc.rank(x))
chl
  • 53,725
  • 3
  • Your function does not mimic Excel's percentrank-function, which is good (+1) since the latter gives "strange" results (see my comparison). 2. I wouldn't name the data frame df, because df is an R function (the density of the F distribution, see ?df).
  • – Bernd Weiss Jun 15 '11 at 11:04
  • 1
    @Bernd Thanks. (1) There are some built-in functions for computing PR in various psychometrics packages. I think I grabbed this one from the CTT package a while ago. I didn't check against Excel because I don't have/use it. About (2) I seem to always forget about this! Let's go with my.* (Perl way) :-) – chl Jun 15 '11 at 11:21
  • @chl why is the trunc required? It seems rank will always return an integer anyway. – Tyler Rinker May 10 '18 at 18:38
  • 1
    @Tyler Nope. In case there are ties, rank() defaults to taking the average of the tied values (cf. ties.method = c("average",...)). – chl May 11 '18 at 13:15
  • 1
    Beware that NA values should be removed! This can be done by adding x = x[!is.na(x)] – Antoine Mar 21 '21 at 16:55