3

I'm not super experienced in statistics so sorry if some terminology is off.

I'm trying to find the mean of some distribution, call it $P$. The problem is, the samples aren't directly visible. For each sample $x_i \sim P$, I only know if $x_i > y_i$ where $y_i$ is another random variable drawn from a different distribution $P'$. All variables are independent of one another if that makes a difference. To be explicit, each $y_i$ is completely known and redrawn for each $x_i$. Thanks!

1 Answers1

0

Here are two cases with clear answers.

  1. The $x$'s are known to be distributed normally, and there are enough observations with $y$'s near $q_1$ and $q_2$ that we can determine $P[x_1 < q_1] = p_1$ and $P[x_2 < q_2] = p_2$. This might happen if there are only two possible values of $y$.

To analyze this, let $Q$ be $\Phi^{-1}$, the standard normal quantile function. Then we have $$\mu + Q(p_1)\sigma = q_1$$ $$\mu + Q(p_2)\sigma = q_2$$ We get the mean of the normal distribution by solving these: $$\mu = \frac{q_1 Q(p_2) - q_2 Q(p_1)}{Q(p_2) - Q(p_1)}$$

  1. The $x$'s are known to be distributed exponentially. Then the mean estimated by MLE can be approximated as a nice linear function of the $y$'s.

Let the $y_i$'s with $x_i<y_i$ be $a_1, \ldots a_m$. Let the $y_j$'s with $x_j>y_j$ be $b_1, \ldots b_n$.

Let the distribution for the $x$'s have mean $1/\lambda$. Then the probability of the observed outcome is: $$\left(\prod \left(1-e^{-\lambda a_i}\right)\right) \left(\prod e^{-\lambda b_j}\right)$$ So we can maximize this by maximizing its log: $$\left(\sum \ln\left(1-e^{-\lambda a_i}\right)\right)- \left(\sum \lambda b_j\right)$$ This will happen when its derivative with respect to $\lambda$ is 0, which is when:$$\sum \frac{a_ie^{-\lambda a_i}}{1-e^{-\lambda a_i}}= \sum b_j$$ This can be solved numerically. Alternatively, for small $\lambda a_i$, we can use Taylor series to approximate the left hand side as $m/\lambda - \sum a_i/2$, which gives the maximum likelihood estimate of the mean as approximately $$\frac{1}{\lambda}\sim \frac{1}{2}\bar{a} + \frac{n}{m}\bar{b}$$

I like this because the final result is both simpler and less obvious than might be expected. For instance, it means that if $x_i<y_i$ and $x_i>y_i$ about equally often, then the cases with $x_i>y_i$ are about twice as important in estimating the mean.

Matt F.
  • 4,726