Is there a weighted.median() function?

Question

I'm looking for something similar in form to weighted.mean(). I've found some solutions via search that write out the entire function but would appreciate something a bit more user friendly.

score 51 · Accepted Answer · edited Jan 23 '14 at 10:59

51

The following packages all have a function to calculate a weighted median: 'aroma.light', 'isotone', 'limma', 'cwhmisc', 'ergm', 'laeken', 'matrixStats, 'PSCBS', and 'bigvis' (on github).

To find them I used the invaluable findFn() in the 'sos' package which is an extension for R's inbuilt help.

findFn('weighted median')

Or,

???'weighted median'

as ??? is a shortcut in the same way ?some.function is for help(some.function)

edited Jan 23 '14 at 10:59

Richie Cotton

113,548
43
231
352

answered May 01 '10 at 04:26

wkmor1

6,916
3
30
23

2

I didn't know about findFn! That's awesome! – Bob Albright May 01 '10 at 05:39
Agreed on the findFn. Very useful. And rather than install a new package I just got some sleep. I'm just trying to calculate the median of weighted data and did this: median(rep(x, each=w)). – Michael Williams May 01 '10 at 14:09
Yep, median(rep(x, each=w)), would work too. But only if all your weights are integers. – wkmor1 May 01 '10 at 23:35
9

Hmisc also has wtd.quantile :) – Anthony Damico Mar 03 '13 at 09:55
That's going to be `median(rep(x,times=w))` if you want to give a vector with weights. The argument `each` only takes a single value. --- edit: Just saw the answer of @Jaitropmange. As (s)he said. – Joris Meys Oct 15 '13 at 21:27

score 30 · Answer 2 · edited May 09 '16 at 18:38

30

Some experience using the answers from @wkmor1 and @Jaitropmange.

I've checked 3 functions from 3 packages, isotone, laeken, and matrixStats. Only matrixStats works properly. Other two (just as the median(rep(x, times=w) solution) give integer output. As long as I calculated median age of populations, decimal places matter.

Reproducible example. Calculation of the median age of a population

df <- data.frame(age = 0:100,
                 pop = spline(c(4,7,9,8,7,6,4,3,2,1),n = 101)$y)

library(isotone)
library(laeken)
library(matrixStats)

isotone::weighted.median(df$age,df$pop)
# [1] 36
laeken::weightedMedian(df$age,df$pop)
# [1] 36
matrixStats::weightedMedian(df$age,df$pop)
# [1] 36.164
median(rep(df$age, times=df$pop))
# [1] 35

Summary

matrixStats::weightedMedian() is the reliable solution

edited May 09 '16 at 18:38

MichaelChirico

32,615
13
106
186

answered Sep 28 '15 at 22:35

ikashnitsky

2,761
23
39

3

Note that the rep(x, times=w) method requires integer weights, so it does not apply for your case. You could approximate using: median(rep(df$age, times=1000*df$pop)), which gives 36. Whether you want a decimal output depends on your definition of median. – Jaitropmange Oct 04 '15 at 15:28
with one little detail... Answer 36 is correct according to definition of [weighted median on wikipedia](https://en.wikipedia.org/wiki/Weighted_median). – Kamil S Jaron Dec 07 '16 at 09:58
3

yeah... we have no better reference than wikipedia – ikashnitsky Dec 07 '16 at 13:55
3

I do not understand the sarcasm here. Is it not quite devastating to have many functions for the same thing, but with different results? How do you know that `matrixStats::weightedMedian()` gives the _reliable solution_? [The code](https://github.com/HenrikBengtsson/matrixStats/blob/master/src/weightedMedian_lowlevel_template.h) seems to suggest it should produce the [weighted percentile method result](https://en.wikipedia.org/wiki/Percentile#Weighted_percentile), but the values are different from `spatstat::weighted.median()` which uses this method and yields `35.66291` to the problem above. – 0range Oct 22 '18 at 23:49
2

Of course, instead of choosing one way to compute the weighted median, we can also use the weighted median over all of them... – 0range Oct 23 '18 at 00:14

score 27 · Answer 3 · edited Nov 27 '16 at 00:32

27

To calculate the weighted median of a vector x using a same length vector of (integer) weights w:

median(rep(x, times=w))

edited Nov 27 '16 at 00:32

Joe

6,965
1
43
53

answered Mar 02 '13 at 18:30

Jaitropmange

372
4
13

3

This only works with integer weights. Weights in survey data are typically decimals. – Westcroft_to_Apse Oct 24 '17 at 17:31
1

Very bad idea. What happens if `w` has very large values? Filling memory just to compute a median is unwise. – Jean-Claude Arbaut Feb 18 '22 at 18:15

score 6 · Answer 4 · answered Feb 13 '18 at 07:34

6

Really old post but I just came across it and did some testing of the different methods. spatstat::weighted.median() seemed to be about 14 times faster than median(rep(x, times=w)) and its actually noticeable if you want to run the function more than a couple times. Testing was with a relatively large survey, about 15,000 people.

answered Feb 13 '18 at 07:34

Haututu

83
2
4

This function is now in the `{spatstat.geom}` package : `spatstat.geom::weighted.median()` – mdag02 Aug 30 '21 at 06:55

score 5 · Answer 5 · answered May 19 '20 at 15:24

5

This is just a simple solution, ready to use almost anywhere.

weighted.median <- function(x, w) {
  w <- w[order(x)]
  x <- x[order(x)]

  prob <- cumsum(w)/sum(w)
  ps <- which(abs(prob - .5) == min(abs(prob - .5)))
  return(x[ps])
}

answered May 19 '20 at 15:24

R.Andres Castaneda

428
3
9

score 4 · Answer 6 · answered Oct 23 '18 at 06:30

One can also use stats::density to create a weighted PDF, then convert this to a CDF, as elaborated here:

my_wtd_q = function(x, w, prob, n = 4096) 
  with(density(x, weights = w/sum(w), n = n), 
       x[which.max(cumsum(y*(x[2L] - x[1L])) >= prob)])

Then my_wtd_q(x, w, .5) will be the weighted median.

One could also be more careful to ensure that the total area under the density is one by re-normalizing.

score 2 · Answer 7 · answered Apr 15 '18 at 01:12

2

If you're working with the survey package, assuming you've defined your survey design and x is your variable of interest:

svyquantile(~x, mydesign, c(0.5))

answered Apr 15 '18 at 01:12

bdetweiler

1,474
1
17
26

GKi · Answer 8 · 2019-05-29T09:30:41.893

Using the source from Deleet and the data from ikashnitsky, a weighted median could be calculated in base with:

df <- data.frame(age = 0:100,
                 pop = spline(c(4,7,9,8,7,6,4,3,2,1),n = 101)$y)

medianWeighted <- function(x, w) {
  x <- aggregate(w[w>0] ~ x[w>0], FUN=sum)
  approxfun(filter(c(0,cumsum(x$w)/sum(x$w)), c(.5,.5), sides=1)[-1], x$x)(.5)
}
medianWeighted(df$age,df$pop) #Interpolates between observed Numbers
#[1] 36.164

medianWeightedI <- function(x, w) { 
  w <- w[order(x)]
  x <- x[order(x)]
  x[which.min(abs(filter(c(0,cumsum(w)/sum(w)), c(.5,.5), sides=1)[-1] - 0.5))]
}
medianWeightedI(df$age,df$pop) #Takes only numbers which have been observed
#[1] 36

In case you also wanted to calculate weighted quantiles.

quantileWeighted <- function(x, w, probs = seq(0, 1, 0.25)) {
  x <- aggregate(w[w>0] ~ x[w>0], FUN=sum)
  approxfun(filter(c(0,cumsum(x$w)/sum(x$w)), c(.5,.5), sides=1)[-1], x$x, rule=2)(probs)
}
quantileWeighted(df$age, df$pop)
#[1]   0.00000  20.21336  36.16400  55.98371 100.00000

quantileWeightedI <- function(x, w, probs = seq(0, 1, 0.25)) {
  x <- aggregate(w[w>0] ~ x[w>0], FUN=sum)
  stepfun(cumsum(x$w[-nrow(x)])/sum(x$w[-nrow(x)]), x$x)(probs)
}
quantileWeightedI(df$age, df$pop)
#[1]   0  20  36  56 100

score 0 · Answer 9 · answered Jan 22 '22 at 21:56

I got here looking for weighted quantiles, so I thought I might as well leave for future readers what I ended up with. Naturally, using probs = 0.5 will return the weighted median.

I started with MichaelChirico's answer, which unfortunately was off at the edges. Then I decided to switch from density() to approx(). Finally, I believe I nailed the correction factor to ensure consistency with the default algorithm of the unweighted quantile().

weighted.quantile <- function(x, w, probs = seq(0, 1, 0.25),
                              na.rm = FALSE, names = TRUE) {

  if (any(probs > 1) | any(probs < 0)) stop("'probs' outside [0,1]")

  if (length(w) == 1) w <- rep(w, length(x))
  if (length(w) != length(x)) stop("w must have length 1 or be as long as x")

  if (isTRUE(na.rm)) {
    w <- x[!is.na(x)]
    x <- x[!is.na(x)]
  }

  w <- w[order(x)] / sum(w)
  x <- x[order(x)]

  cum_w <- cumsum(w) - w * (1 - (seq_along(w) - 1) / (length(w) - 1))
  res <- approx(x = cum_w, y = x, xout = probs)$y

  if (isTRUE(names)) {
    res <- setNames(res, paste0(format(100 * probs, digits = 7), "%"))
  }
  res
}

When weights are uniform, the weighted quantiles are identical to regular unweighted quantiles:

x <- rnorm(100)
stopifnot(stopifnot(identical(weighted.quantile(x, w = 1), quantile(x)))

Example using the same data as in the weighted.mean() man page.

x <- c(3.7, 3.3, 3.5, 2.8)
w <- c(5,   5,   4,   1)/15
stopifnot(isTRUE(all.equal(
  weighted.quantile(x, w, 0:4/4, names = FALSE),
  c(2.8, 3.33611111111111, 3.46111111111111, 3.58157894736842,
    3.7)
)))

And this is for whoever solely wants the weighted median value:

weighted.median <- function(x, w, ...) {
  weighted.quantile(x, w, probs = 0.5, names = FALSE, ...)
}

Is there a weighted.median() function?

9 Answers9

Reproducible example. Calculation of the median age of a population

Summary

Linked

Related