1

Define $e_i$ to be iid random variables drawn from an exponential distribution with parameter $\lambda=1$.

$a_i$ are numerical constants. I am interested in the probability that

$a_1 + e_1 < a_2 + e_2$

ie, that $a_1 +e_1$ is the min. The solution is $\exp[- (a_1 - a_2 )]/2$ if $(a_1 - a_2 )>0$, and $1-\exp(a_1 -a_2)/2$ otherwise. For example, if $a=(1,2)$, then probabilities are $(82\%, 18\%)$. [Note: I am not interested in the order statistics.] PS: Formulas and numbers corrected---thanks angryavian below.

What about three (or more) variables? Again, I want to know the probability that i is the min, given $(a_1 , a_2 , a_3 )$. For example, if $a=(1,2,3)$, then the probabilities I am interested in are $(76.50\%, 17.56\%, 5.94\%)$.

I am guessing there is no analytical generalization, but if there is, I would greatly appreciate pointers.

ivo Welch
  • 121

1 Answers1

2

Without loss of generality suppose $a_1 < \cdots < a_n$.

The probability you seek is $$P \left(\bigcap_{j\ne i} B_j\right) = E[\mathbf{1}_{\bigcap_{j\ne i} B_j}] = E\left[\prod_{j \ne i} \mathbf{1}_{B_j}\right]$$ where $B_j := \{e_i + a_i \le e_j + a_j\}$.

If we condition on $e_i$, then the $B_j$ are independent, and we have \begin{align} P \left(\bigcap_{j\ne i} B_j\right) &= E\left[\prod_{j \ne i} \mathbf{1}_{B_j}\right] \\ &= E\left[E\left[\prod_{j \ne i} \mathbf{1}_{B_j} \mid e_i\right]\right] \\ &= E\left[\prod_{j \ne i} E[\mathbf{1}_{B_j} \mid e_i]\right] \\ &= E\left[\prod_{j \ne i} \min\{1, e^{-(e_i + a_i - a_j)}\}\right]. \end{align}

Not quite sure how to compute this analytically in general.


In the case of $n=2$ and $i=1$, some calculation yields \begin{align} E[\min\{1, e^{-(e_1 + a_1 - a_2)}\}] &= (1 - e^{-(a_2 - a_1)}) + e^{a_2 - a_1}\int_{a_j - a_i}^\infty e^{-2x} \mathop{dx} \\ &= (1-e^{-(a_2 - a_1)}) + e^{a_2 - a_1}\frac{1}{2} e^{-2(a_2 - a_1)} \\ &= 1 - \frac{1}{2} e^{-(a_2 - a_1)}. \end{align} This is a bit different from your reported probability $88.4\%$ and formula $1 - \exp(-(a_2 - a_1))$, but I verified the above formula with simulations:

n <- 2
a <- c(1, 2)
ct <- rep(0, n)
formula <- 0
trials <- 1e5
for (i in 1:trials) {
  e <- rexp(2, 1)
  ct[which.min(a+e)] <- ct[which.min(a+e)]+1
  formula <- formula + min(1, exp(-e[1]-a[1]+a[2]))
}
ct/trials
formula/trials

> ct/trials
[1] 0.81789 0.18211

> formula/trials
[1] 0.8170634
angryavian
  • 2,328
  • 1
    +1. FWIW, the following code will be around 20 times faster. That makes little difference here, but in more complex situations the increased precision it affords can be helpful: x <- matrix(rexp(2*trials), nrow=2) + a; print(c(Simulation=mean(x[1,] < x[2,]), Formula=1 - exp(-diff(a))/2)) – whuber Nov 26 '17 at 15:36
  • 1
    @whuber Thanks very much for the pointer; I should have avoided the for loop! May I ask where the increased precision comes from? – angryavian Nov 26 '17 at 19:23
  • 1
    For a given amount of time (assuming you have sufficient RAM), you can generate 20 times as much data, thereby increasing the precision in the estimate by a factor of $\sqrt{20}$. – whuber Nov 26 '17 at 20:11
  • 1
    @whuber Thanks for clarifying; I mistook your first comment to mean increased precision with the same number of trials, rather than increased precision with the same running time. – angryavian Nov 26 '17 at 22:51
  • 1
    ...and thanks for correcting my transcription error forgetting to write down /2. – ivo Welch Nov 27 '17 at 01:11