3

The median $\tilde{\mu}$ of a sample in many ways is analogous to the sample mean $\mu$. Both are an estimate for the population median or mean respectively, and both approach a Gaussian distribution for a large sample under certain conditions. It is known that the median asymptotically approaches a Gaussian distribution with variance $\sigma^2_{\tilde{\mu}}$ if the density $p(\tilde{\mu})$ is nonzero and continuously differentiable around the median (Rider 1960): \begin{align} \sigma^2_{\tilde{\mu}} = \frac{1}{4 N \left(p\left(\tilde{\mu}\right)\right)^2} \end{align} If the samples $x_i$ have the same mean but different variances $\sigma_i^2$, it can be shown that the inverse variance weighted sample mean ${_w\mu}$ is the estimate for the population mean with the lowest variance $\sigma^2_{_w\mu}$. \begin{align} {_w\mu} &= \frac{\sum_{i=1}^N w_i x_i }{\sum_{i=1}^N w_i}\\ w_i &= \sigma_i^{-2}\\ \sigma_{_w\mu}^2 &= \dfrac{1}{\sum_{i=1}^N w_i} \end{align} I am looking for an equivalent for the median. The weighted sample median ${_w\tilde{\mu}}$ is any value, which partitions the weights associated with values less than or equal and the weights of the values larger than or equal so their sums differ the least: \begin{align} {_w\tilde{\mu}} = \min_{_w\tilde{\mu}} \left| \left( \sum_{ \left\{ i | x_i \le _w\tilde{\mu} \right\} } w_i \right) - \left( \sum_{ \left\{ i | x_i \ge _w\tilde{\mu} \right\} } w_i \right) \right| \end{align} Now the question arises, what is the variance of the weighted sample median and how to set the weights optimally? I thought things like these must have been proven in the past a long time ago, but I was not able to find anything. I'd be thankful if you can help me find out more. This is how far I got on my own:

If samples have a different variance they must have come from a different distribution, so let's assume each sample is drawn from a different probability distribution $p_i$. Numerical experiments seem to indicate that in order to minimize the variance of the weighted median the weights should be set proportional to the density at the median of the distribution the sample was drawn from $p_i({_w\tilde{\mu}})$. This also makes a nice connection to inverse variance weights that are optimal for the weighted average, because in the weighted median, asymptotically each sample contributes a variance inversely proportional to the square of this density. Relative weighting between samples following a Gaussian or uniform distribution with identical varianance each. Fig. 1: Relative weighting between samples following a Gaussian or uniform distribution with identical varianance each. The ratio of the Gausian density to the uniform density at the median is $\sqrt{\frac{6}{\pi}} \approx 1.38$ , this ratio is reached at around $0.58$ on the x-axis, coinciding with the minimum variance of the weighted sample median.

The median of absolute deviations of the sample median of samples following either a gaussian, a Laplacian or a uniform distribution, with variances following an exponential distribution. Fig. 2: The median of absolute deviations of the sample median of samples following either a gaussian, a Laplacian or a uniform distribution, with variances following an exponential distribution. The weights are set to a power of the associated sample variances and as can be seen the optimal power is around $0.5$.

When the weights are set equal to $p(\tilde{\mu})$ the variance of the median seems to approach: \begin{align} \sigma^2_{\tilde{\mu}} = \frac{1}{4 \left(\sum \left(p_i\left(\tilde{\mu}\right)\right)^2\right)} \end{align}

Rider 1960: https://www.tandfonline.com/doi/abs/10.1080/01621459.1960.10482056

  • 4
    "Assuming each sample is drawn from a different probability distribution pi," what exactly is your weighted median supposed to be estimating? Unless we know that, it seems we cannot even understand what you might be optimizing. – whuber May 03 '22 at 20:28
  • @whuber I should have clarified, with the same median. This median is what I want to find. – Wolfgang Brehm May 03 '22 at 20:31
  • Then how can you possibly determine the weights with only that information? Are the weights given to you by some oracle that knows the distributions from which the $p_i$ are drawn? – whuber May 03 '22 at 20:33
  • @whuber just like you only need to know the variance to determine the ideal weights for the weighted average to minimize the variance, it turns out you only need to know the density at the median to determine the weights. I just can't prove it. – Wolfgang Brehm May 03 '22 at 20:42
  • What circumstance leads to all of these distributions having the same median? I can see that happening with symmetric distributions or with lognormal distributions, but in those cases it would be easier to analyze the mean or log-mean. – Matt F. May 04 '22 at 13:18
  • 2
    The density at the median is a single number. I am unable to see any way to use that to develop different weights for different observations. I can imagine that with enough data one might attempt, say, a nonparametric estimation of the mixture density in a neighborhood of the median and then exploit that for weight estimation, but whether that's how you're conceiving of this problem is not apparent. BTW, there's much relevant information at https://stats.stackexchange.com/questions/45124. – whuber May 04 '22 at 13:52
  • @MattF. as I said, just like for the weighted average. If all samples had the same distribution the weighted average would make no sense. If the samples have different variance they must have a different distribution. So you have samples with different precision, but there are many outliers too, you would like to use the weighted average, but need something more robust - the weighted median. – Wolfgang Brehm May 04 '22 at 13:52
  • @MattF. In my specific case we measure intensities following approximately a normal distribution with variance that can be estimated and some outliers. But the intensity values need to be scaled because each experiment is different, the crystal diffracts more or less, the beam is stronger or weaker. Traditionally we would then use the weighted average to produce a result, and treat the outliers with a $3 \sigma$ rejection criterion, but I would like to compare this with the weighted median because the median of (weighted) means had worked well before. – Wolfgang Brehm May 04 '22 at 14:05
  • @whuber If the density at the median is a single number weighing the samples differently does not make sense, that is true. But suppose the samples come from different distributions with the same median. Then the density at the median of these distributions that each sample came from is potentially a different number. – Wolfgang Brehm May 04 '22 at 14:13
  • 1
    Fine: but how do you know or estimate the density at the median of each of these distributions? There's some disconnect here, suggesting that you might not have fully explained your situation. What exactly do you know about each $p_i$ and what exactly do you assume about their underlying distributions? – whuber May 04 '22 at 14:16
  • @whuber in my case I start out with samples coming from the same distribution with an estimate for the minimum standard deviation, but they are multiplied with a scaling constant to get them on the same scale. Whatever distribution you start out with, if you multiply by a constant, the density at the median is divided by the same constant. This is not the only case where you would want a weighted median, but it is the case that started my interest. – Wolfgang Brehm May 04 '22 at 14:22
  • 1
    Given your comments, it might be clearer to ask the question as: "We measure an intensity in several ways, each getting a sample $S_i$ with its own error patterns. We take a weighted average of those samples as follows, and reject outliers as follows, and reapply the process without the rejected values as follows, thus estimating the intensity as follows, which is optimal as follows. What would be a similar procedure based on medians rather than means, and with what definitions of outliers and optimality?" But there's a lot of "as follows", and probably not all of it is in the current post. – Matt F. May 04 '22 at 14:23
  • @MattF. I don't want a specific answer for my specific problem, because there are even better solutions when we start to model the outliers and distributions involved in greater detail. I want a general answer to how the weighted median behaves and how to derive the weights. – Wolfgang Brehm May 04 '22 at 14:27
  • 2
    If you want a general answer about weighted medians, then it would help to point to a general statement about weighted means and their optimality in the literature, and ask for an analog with medians. But the current post doesn't provide a general statement to use in the comparison, certainly not with a general proof. – Matt F. May 04 '22 at 14:32
  • @MattF. I'll add the analogy with the weighted mean then. – Wolfgang Brehm May 04 '22 at 14:37

2 Answers2

0

The comments ask for a general approach to weighted medians. In the approach that makes sense to me, the weights end up the same as for weighted means.

The following result on means is a straightforward constrained optimization (e.g. here):

Suppose we have $n$ different methods of measuring the same quantity, and the sample mean $M_i$ from method $i$ has mean $\mu$ and variance $V_i$. Then the minimum-variance weighted average of those sample means is $$\frac{\sum M_i\, / \, V_i}{\sum 1 \, / \,V_i}$$

The sample medians, as asked about and pointed out in the question, have normal distributions. So a weighted average of sample medians will also be distributed normally. The variances will be proportional to the squares of the interquartile ranges, and minimizing the overall interquartile range will have the same result as minimizing the overall variance. This leads to the following result, parallel to the statement on means:

Suppose we have $n$ different methods of measuring the same quantity, and the sample median $M_i$ from method $i$ has median $\mu$ and interquartile range $r_i$. Then the minimum-interquartile-range weighted average of those sample medians is $$\frac{\sum M_i\, / \, r_i^2}{\sum 1\, / \, r_i^2}$$

If the goal is instead to take a weighted median of the samples (or of their sample median distributions), then we are looking for the mixture $R$ of the distributions $X_i$ (or of their sample median distributions) so that the sample median of $R$ has minimal variance.

Since the variance of the sample median of $R$ is inversely proportional to the pdf of $R$ at the median, we are looking for the mixture $R$ with highest possible pdf at that median. Assuming that all the $X_i$ have the same median (as in the post, and as in the case where they are all normal or all symmetric about the origin), the optimal mixture of $X_i$’s will be exclusively composed of the one $X_i$ which has the highest pdf at its median.

Matt F.
  • 4,726
  • Thank you for the answer, but I'm not looking for the (weighted) average of medians, I'm just looking for the weighted median, no averaging involved. That is for example, take the set of samples with associated weights: {(1,3), (2,4), (3,5), (4,6)} . The weighted median is 3 because $| (5+6) - (3+4+5) | = 1$ is the most equal partitioning according to the given weights you are going to find. – Wolfgang Brehm May 05 '22 at 16:18
  • Also, the optimal weights for the weighted median, sadly, are not the same as for the weighted mean, this is easy to show. Tell me your favorite programming language and I'll write you a small numerical demonstration. – Wolfgang Brehm May 05 '22 at 16:20
  • I’ve added a comment on the optimal weighted median, which should put the entire weight on one of the samples. – Matt F. May 05 '22 at 16:57
  • Putting all the weight on the sample with the highest density at the median is the same as taking this sample exclusively. So take for example two uniform distributions, one from -1 to 1 and one from -2 to 2. Let's say we have one sample from the first and as many as we like from the second one. Taking the first sample exclusively we have a variance of 1/3 . Even disregarding the best sample and computing the median of just $11$ samples of the other distribution will lead to a variance of about $0.31$, which is lower than 1/3 . Proper weighting gives us $0.22$ . – Wolfgang Brehm May 05 '22 at 17:25
  • It wasn’t clear that the number of samples was fixed. – Matt F. May 05 '22 at 17:30
  • I think this warrants a new question: “Suppose distributions $X$ and $Y$ have the same median, and $m$ and $n$ are fixed sample sizes. What weights $v$ and $w$ should be chosen to minimize the variance of the weighted median of $m$ samples of $X$, all weighted by $v$, and $n$ samples of $Y$, all weighted by $w$?” You might also specify whether to assume that $X$ and $Y$ have unknown distributions, unknown symmetric distributions, distributions known up to a translation, or normal distributions with 0, 1, or 2 known parameters. – Matt F. May 05 '22 at 17:39
  • 1
    Re "The variances will be proportional to the squares of the interquartile ranges:" Not so. The variances will also be inversely proportional to the squared densities at the medians. – whuber May 07 '22 at 21:45
  • @whuber, I claim that if $X,X’$ are normal variables, and $M,M’$ are the medians of $n$ samples of $X,X’$ respectively, then $Var(M)/Var(M’)$, $Var(X)/Var(X’)$, $IQR(M)^2/IQR(M’)^2$ and $IQR(X)^2/IQR(X’)^2$ are all equal. Do you agree, and do you see that as justifying the statement quoted in the above comment? – Matt F. May 08 '22 at 01:34
  • 1
    For Normal variables that's the case (because all those quantities in your ratios depend only on the scale parameter), but I have understood this thread to be about random variables generally (at least continuous ones). Indeed, I still don't see where you have stipulated your post applies only to Normal variables. – whuber May 08 '22 at 12:49
  • @whuber, that comment was about summarizing via a weighted average of sample medians, which I was indeed assuming approximately normal, following the OP. – Matt F. May 09 '22 at 01:33
  • 2
    @Matt The OP only states the medians are asymptotically Normally distributed, not that the underlying distributions are Normal. What remains to be seen--and is generally false--is that a weighted average (or, indeed, any average) of medians is a reasonable estimate of the overall median. – whuber May 09 '22 at 12:55
0

Assume all samples $x_i$ can follow a different probability distribution $p_i\left(x\right)$, otherwise different weighting would have only very limited application and the non-weighted median would almost always be the better choice. Consider a sorted list of weighted samples and the running sum of weights normalized by the sum of all weights. This computes the empirical cumulative distribution of the weighted mixture distribution, let the weighted cumulative distribution be ${_wP}\left(x\right)$. \begin{align*} {_wP}\left(x\right) = \dfrac{\left( \sum_i^N w_i \int_{-\infty}^x p_i\left(y\right) dy \right)}{\left( \sum_i^N w_i \right)} \end{align*} Now consider the sum of weights associated with samples less than or equal to the population median ${_p\tilde{\mu}}$, normalized by the sum of all weights, let it be $c$: \begin{align*} c = \dfrac{\left( \sum_{ \left\{ i | x_i \le {_p\tilde{\mu}} \right\} } w_i \right)}{\left( \sum_i^N w_i \right)} \end{align*} Without knowing the median of each distribution, each weight has probablity $^1/_2$ of being part of this sum or its counterpart. The variance introduced to the nominator by each weight is therefore $^1/_4$ the weights squared and the variance of $c$ is approximadted by the sum of individual variances divided by the normalization factor squared: \begin{align*} \sigma^2_c = \dfrac{\left(\sum_i^N w_i^2\right)}{4 \left( \sum_i^N w_i \right)^2} \end{align*} The weighted sample median ${_w\tilde{\mu}}$ is the sample where the empirical cumulative distribution reaches $^1/_2$. The expected value of $c$ is $^1/_2$ and its variance is the mean squared deviation between the empirical partitioning which determines the weighted sample median and the partitioning which would lead to the value closest to the weighted population median. Therefore the the inverse weighted cumulative distribution of $c$ is the same variance as the weighted median. Error propagation demands the derivative of ${_wP^{-1}}$ at $^1/_2$ which is inverse to the derivative of ${_wP}$ at the weighted population median. \begin{align*} \sigma^2_{_w\tilde{\mu}} &= \sigma^2_c \dfrac{d{_wP^{-1}}}{dx} \left(c\right)\\ \sigma^2_{_w\tilde{\mu}} &= \sigma^2_c \left(\dfrac{d{_wP}}{dx} \left({_w\mu}\right)\right)^{-1}\\ \sigma^2_{_w\tilde{\mu}} &= \dfrac{\left(\sum_i^N w_i^2\right)}{4 \left( \sum_i^N w_i \right)^2} \dfrac{\left( \sum_i^N w_i \right)^2}{\left( \sum_i^N w_i p_i\left(\tilde{\mu}\right)\right)^2} \\ \sigma^2_{_w\tilde{\mu}} &= \dfrac{\left(\sum_i^N w_i^2\right)}{4 \left( \sum_i^N w_i p_i\left(\tilde{\mu}\right)\right)^2} \end{align*}

This is the variance we seek to minimize and a very similar optimization problem to the optimal weights for the weighted average. The weights that minimize the variance of the weighted median are reciprocal to the probability density of the sample distribution at the median, as can be shown taking the first and second derivatives with respect to the individual weights. The first derivative can only be zero for $w_i = p_i\left(\tilde{\mu}\right)$, the second derivative is positive: \begin{align*} \dfrac{d \sigma^2_{_w\tilde{\mu}}}{d w_j} &= \dfrac{ w_j \left( \sum_i^N w_i p_i\left(\tilde{\mu}\right)\right) - p_j\left(\tilde{\mu}\right) \left(\sum_i^N w_i^2\right)}{2 \left( \sum_i^N w_i p_i\left(\tilde{\mu}\right)\right)^3} \\%x/(2*(p*x+b)^2)-(p*(x^2+a))/(2*(p*x+b)^3) \dfrac{d^2 \sigma^2_{_w\tilde{\mu}}}{d^2 w_j} &= \dfrac{1}{2 \left( \sum_i^N w_i p_i\left(\tilde{\mu}\right)\right)^2} - \dfrac{2 w_j p_j\left(\tilde{\mu}\right)}{2 \left( \sum_i^N w_i p_i\left(\tilde{\mu}\right)\right)^3} + \dfrac{3 \left( p_j\left(\tilde{\mu}\right)\right)^2 \left(\sum_i^N w_i^2\right)}{2 \left( \sum_i^N w_i p_i\left(\tilde{\mu}\right)\right)^4} %1/(2*(p*x+b)^2)-(2*p*x)/(p*x+b)^3+(3*p^2*(x^2+a))/(2*(p*x+b)^4) \end{align*} \begin{align} \min_{w_i} \sigma^2_{_w\tilde{\mu}} = p_i\left({_p\tilde{\mu}}\right) \label{optimal_median_weights} \end{align} When the weights are set proportional to $p_i(\tilde{\mu})$, the variance of the median is equal to $1/4$ the inverse of the sum of probability densities at the median squared: \begin{align} \sigma^2_{_w\tilde{\mu}} &= \dfrac{1}{4 \left(\sum \left(p_i\left({_p\tilde{\mu}}\right)\right)^2\right)} \label{variance_of_weighted_median} \end{align} One unexpected consequence of weights proportional to the probability density at the median is that when the the distribution of each sample has the same shape, and only differs in scaling, the optimal weights are proportional to the inverse standard deviation, not inverse variances as for the weighted average. This is because a scaling by a factor amounts to a linear decrease of the density at the median, but an increase of the variance by the same factor squared.

Edit: Code and whitepaper: https://github.com/1ykos/weighed_median/blob/master/the_optimally_weighted_median_and_its_variance.pdf

  • As @whuber asked about the question: “how do you know or estimate the density at the median of each of these distributions?” If the distributions (the functions $p_i$) are inputs to the problem, then the minimum-variance weighting should place all the weight on one distribution, as in my answer; if the distributions are not inputs to the problem, then this answer is incomplete, because it doesn’t explain how to calculate the weights proposed at the end. – Matt F. May 08 '22 at 06:50
  • 1
    Are you assuming that all the samples come from scaled copies of the same distribution? In that case the weights you propose would be inversely proportional to the square roots of the variances in each sample, which we can estimate well enough. That would make sense of a lot of what you say, but if that’s the assumption it’s worth highlighting and explaining. – Matt F. May 08 '22 at 07:15
  • @MattF. Estimating the density at the median of each sample is just as fundamental to the optimal weights of the weighted median as the variance is for the weights in the weighted mean. There is no way around it, you need to have some estimate or proxy or you are better off using the unweighted median or mean. The method of estimation can be different in different applications. – Wolfgang Brehm May 08 '22 at 10:36
  • I am not assuming that all samples come from the same distribution: "Assume all samples $x_i$ can follow a different probability distribution" I would have been happy to grant this assumption although my specific problem would not have been solved that way, but it turns out it is not needed anyways. You are correct though with the observation that the optimal weights would be inverse to the scale factor and inverse to the square root of the variance in that case. – Wolfgang Brehm May 08 '22 at 10:48
  • @MattF. "If the distributions (the function $p_i$) are inputs to the problem, then the minimum-variance weighting should place all the weight on one distribution" If we know the distribution of each sample, we know the median of each sample distribution. Why not take the median directly then? – Wolfgang Brehm May 08 '22 at 11:52
  • I have no idea what the parameters of this question are. – Matt F. May 08 '22 at 12:22