Suppose $Y_1, \dots, Y_n \overset{\text{iid}}{\sim} \text{Exp}(1)$. Show $\sum_{i=1}^{n}(Y_i - Y_{(1)}) \sim \text{Gamma}(n-1, 1)$

Question

What is the easiest way to see that the following statement is true?

Suppose $Y_1, \dots, Y_n \overset{\text{iid}}{\sim} \text{Exp}(1)$. Show $\sum_{i=1}^{n}(Y_i - Y_{(1)}) \sim \text{Gamma}(n-1, 1)$.

Note that $Y_{(1)} = \min\limits_{1 \leq i \leq n}Y_i$.

By $X \sim \text{Exp}(\beta)$, this means that $f_{X}(x) = \dfrac{1}{\beta}e^{-x/\beta} \cdot \mathbf{1}_{\{x > 0\}}$.

It is easy to see that $Y_{(1)} \sim \text{Exponential}(1/n)$. Furthermore, we also have that $\sum_{i=1}^{n}Y_i \sim \text{Gamma}(\alpha = n, \beta = 1)$ under the parametrization $$f_{Y}(y) =\dfrac{1}{\Gamma(\alpha)\beta^{\alpha}}x^{\alpha-1}e^{-x/\beta}\mathbf{1}_{\{x > 0\}}\text{, }\qquad \alpha, \beta> 0\text{.}$$

Solution given Xi'an's Answer: Using the notation in the original question: $$\begin{align} \sum_{i=1}^{n}[Y_i - Y_{(1)}] &= \sum_{i=1}^{n}[Y_{(i)}-Y_{(1)}] \\ &= \sum_{i=1}^{n}Y_{(i)}-nY_{(1)}\\ &= \sum_{i=1}^{n}\{Y_{(i)}-Y_{(i-1)}+Y_{(i-1)}-\cdots-Y_{(1)}+Y_{(1)}\}-nY_{(1)}\\ &= \sum_{i=1}^n\sum_{j=1}^{i}\{Y_{(j)}-Y_{(j-1)}\}-nY_{(1)}\text{ where } Y_{(0)} = 0 \\ &= \sum_{j=1}^n\sum_{i=j}^{n}\{Y_{(j)}-Y_{(j-1)}\}-nY_{(1)}\\ &= \sum_{j=1}^{n}(n-j+1)[Y_{(j)}-Y_{(j-1)}]-nY_{(1)}\\ &= \sum_{i=1}^{n}(n-i+1)[Y_{(i)}-Y_{(i-1)}]-nY_{(1)}\\ &= \sum_{i=2}^{n}(n-i+1)[Y_{(i)}-Y_{(i-1)}]+nY_{(1)}-nY_{(1)} \\ &= \sum_{i=2}^{n}(n-i+1)[Y_{(i)}-Y_{(i-1)}]\text{.} \end{align}$$ From this, we get that $\sum_{i=2}^{n}(n-i+1)[Y_{(i)}-Y_{(i-1)}] \sim \text{Gamma}(n-1, 1)$.

Are you asking or telling? If you are asking you should add the self-study tag. — Michael R. Chernick, Apr 07 '17 at 01:59
@MichaelChernick 1) I'm not sure if my proof of independence is correct, and 2) I'm not sure if the result I guessed above pertaining to the difference of Gamma distributions is even correct. This seems to contradict what is given here, but perhaps this situation is different since this difference includes one of the order statistics? I'm not sure. — Clarinetist, Apr 07 '17 at 02:06
@Clarinetist, the difference of gamma random variables not does have a gamma distribution. It has a non-zero probability of being negative. — gammer, Apr 07 '17 at 02:26
@gammer I figured as much. Is this result that I'm trying to show, then, incorrect? — Clarinetist, Apr 07 '17 at 02:27
@gammer Could you please provide some insight as to how to approach this, as my method above is incorrect? What particularly bothers me is the difference $Y_i - Y_{(1)}$, as $Y_{(1)}$ is clearly dependent on the $Y_i$. — Clarinetist, Apr 07 '17 at 02:31
@Clarinetist, I'm not certain. Maybe try to work with $\sum_{i = 2}^n (Y_{(i)} - Y_{(1)})$, which clearly equals the sum you're working with. The answer here might be helpful: http://math.stackexchange.com/questions/80475/order-statistics-of-i-i-d-exponentially-distributed-sample — gammer, Apr 07 '17 at 02:39
@gammer Huh, that looks promising, considering that there are $n-1$ terms. I'm thinking about a Jacobian approach, which I'll try tomorrow. Thank you! — Clarinetist, Apr 07 '17 at 02:44
Have you tried to prove that each $(Y_i - Y_{(1)}) \sim \text{Expon}(1)$ -- except for one $i$, for which $Y_i - Y_{(1)} = 0$ and, then, using the fact that the sum of $(n-1)$ iid Exponential variates will be Gamma distributed? — Marcelo Ventura, Apr 07 '17 at 04:22
@MarceloVentura How would you go about proving that $(Y_i - Y_{(1)}) \sim \text{Exp}(1)$ for all $i$ such that $Y_i \neq Y_{(1)}$? This doesn't seem trivial to me. — Clarinetist, Apr 07 '17 at 11:46
@MarceloVentura I'm not sure if that statement is correct. Please see my work above. — Clarinetist, Apr 07 '17 at 12:22
@gammer I tried finding the distribution of $Y_{(i)} - Y_{(1)}$; please see my work above. — Clarinetist, Apr 07 '17 at 12:23
@MarceloVentura's note is the way to go (+1). Maybe you should expand it into an answer, since the OP is having trouble with it?
For the OP: to prove it, though, try writing out the conditional distribution of $Z_i = Y_i - A$ for $Y_i \geq A$. See whether it depends on $A$, and if you recognize its form. This will get you a good part of the way to the desired proof. — jbowman, Apr 07 '17 at 15:02
@jbowman We have $$f_{Z_i}(z_i) = f_{Y_i}(z_i + a) = e^{-(z_i + a)}$$ and conditioned on $Y_i \geq a$, we divide this by $e^{-a}$, giving $e^{-z_i}$, hence we have $(Z_i \mid Y_i \geq A) \sim \text{Exp}(1)$. Now here's what's bugged me about this proof: I regarded $A$ as a constant. But $Y_{(1)}$ isn't a constant. Why would this work? — Clarinetist, Apr 07 '17 at 15:33
The point is that it doesn't matter what $a$ is! The distribution is always $\text{Exp}(1)$! Remarkable, isn't it? And from this you can conclude that the distribution of $Y_i - Y_{[1]}$ is always $\text{Exp}(1)$ for $i > 1$, regardless of the actual value of $Y_{[1]}$. — jbowman, Apr 07 '17 at 16:39
@Clarinetist The ideas presented in the answer here might be of some use to you. — Glen_b, Apr 08 '17 at 11:26

Xi'an · Accepted Answer · 2017-04-19T17:03:55.663

The proof is given in the Mother of All Random Generation Books, Devroye's Non-uniform Random Variate Generation, on p.211 (and it is a very elegant one!):

Theorem 2.3 (Sukhatme, 1937) If we define $E_{(0)}=0$ then the normalised exponential spacings $$(n-i+1)(E_{(i)}-E_{(i-1)})$$ derived from the order statistics $E_{(1)}\le\ldots\le E_{(n)}$ of an i.i.d. exponential sample of size $n$ are themselves i.i.d. exponential variables

Proof. Since \begin{align*} \sum_{i=1}^n e_i &= \sum_{i=1}^n e_{(i)} =\sum_{i=1}^n \sum_{j=1}^i(e_{(j)}-e_{(j-1)})\\ &=\sum_{j=1}^n \sum_{i=j}^n(e_{(j)}-e_{(j-1)}) =\sum_{j=1}^n (n-j+1)(e_{(j)}-e_{(j-1)}) \end{align*} the joint density of the order statistic $(E_{(1)},\ldots,E_{(n)})$ writes as $$f(\mathbf{e})=n!\,\exp\left\{-\sum_{i=1}^ne_{(i)}\right\}=n!\,\exp\left\{-\sum_{i=1}^n (n-i+1)(e_{(i)}-e_{(i-1)})\right\}$$ Setting $Y_i=(E_{(i)}-E_{(i-1)})$, the change of variables from $(E_{(1)},\ldots,E_{(n)})$ to $(Y_1,\ldots,Y_n)$ has a constant Jacobian [incidentally equal to $1/n!$ but this does not need to be computed] and hence the density of $(Y_1,\ldots,Y_n)$ is proportional to $$\exp\left\{-\sum_{i=1}^n y_i \right\}$$ which establishes the result. Q.E.D.

An alternative suggested to me by Gérard Letac is to check that $$(E_{(1)},\ldots,E_{(n)})$$has the same distribution as$$\left(\frac{E_1}{n},\frac{E_1}{n}+\frac{E_2}{n-1},\ldots,\frac{E_1}{n}+\frac{E_2}{n-1}+\ldots+\frac{E_n}{1}\right)$$ (by virtue of the memoryless property), which makes the derivation of $$\sum_{k=1}^n(E_k-E_{(1)})\sim \sum_{k=1}^{n-1}E_k$$ straightforward.

Thank you for this answer! I'd like to fill in some details for anyone who is reading this in the future: $e_i$ are observed values of the $E_i$, and the easiest way to see that $\sum_{i=1}^{n}e_{(i)} = \sum_{i=1}^{n}(n-i+1)(e_{(i)} - e_{(i-1)}) = \sum_{i=1}^{n}e_{(i)}$ is to write out $\sum_{i=1}^{n}(n-i+1)(e_{(i)} - e_{(i-1)})$ term-by-term. Because the density of $(Y_1, \dots, Y_n)$ is proportional to $\exp\left(-\sum_{i=1}^{n}y_i \right)$, separate the $y_i$ to see the density is proportional to $\prod_{i=1}^{n}e^{-y_i}$, hence $Y_1, \dots, Y_n \overset{\text{iid}}{\sim} \text{Exp}(1)$. — Clarinetist, Apr 07 '17 at 16:54

Alecos Papadopoulos · Answer 2 · 2017-04-08T19:11:37.127

I lay out here what has been suggested in comments by @jbowman.

Let a constant $a\geq 0$. Let $Y_i$ follow an $\text{Exp(1)}$ and consider $Z_i = Y_i-a$. Then

$$\Pr(Z_i\leq z_i \mid Y_i \geq a) = \Pr(Y_i-a\leq z_i \mid Y_i \geq a)$$

$$\implies \Pr(Y_i\leq z_i+a \mid Y_i \geq a) = \frac {\Pr(Y_i\leq z_i+a,Y_i \geq a)}{1-\Pr(Y_i\leq a)}$$

$$\implies \frac {\Pr(a\leq Y_i\leq z_i+a)}{1-\Pr(Y_i\leq a)} = \frac {1-e^{-z_i-a}-1+e^{-a}}{e^{-a}}=1-e^{-z_i} $$

which is the distribution function of $\text{Exp(1)}$.

Let's describe this: the probability that an $\text{Exp(1)}$ r.v. will fall in a specific interval (the numerator in the last line), given that it will exceed the interval's lower bound (the denominator), depends only on the length of the interval and not on where this interval is placed on the real line. This is an incarnation of the "memorylessness" property of the Exponential distribution, here in a more general setting, free of time-interpretations (and it holds for the Exponential distribution in general)

Now, by conditioning on $\{Y_i \geq a\}$ we force $Z_i$ to be non-negative, and crucially, the obtained result holds $\forall a\in \mathbb R^+$. So we can state the following:

If $Y_i\sim \text{Exp(1)}$, then $\forall Q\geq 0 : Z_i = Y_i-Q \geq 0$ $\implies$ $Z_i\sim \text{Exp(1)}$.

Can we find a $Q\geq 0$ that is free to take all non-negative real values and for which the required inequality always holds (almost surely)? If we can, then we can dispense with the conditioning argument.

And indeed we can. It is the minimum-order statistic, $Q=Y_{(1)}$, $\Pr(Y_i \geq Y_{(1)})=1$. So we have obtained

$$Y_i\sim \text{Exp(1)} \implies Y_i-Y_{(1)} \sim \text{Exp(1)}$$

This means that

$$\Pr(Y_i-Y_{(1)} \leq y_i-y_{(1)}) = \Pr(Y_i \leq y_i)$$

So if the probabilistic structure of $Y_i$ remains unchanged if we subtract the minimum order statistic, it follows that the random variables $Z_i=Y_i-Y_{(1)}$ and $Z_j=Y_j-Y_{(1)}$ where $Y_i, Y_j$ independent, are also independent since the possible link between them, $Y_{(1)}$ does not have an effect on the probabilistic structure.

Then the sum $\sum_{i=1}^{n}(Y_i - Y_{(1)})$ contains $n-1$ $\text{Exp(1)}$ i.i.d. random variables (and a zero), and so

$$\sum_{i=1}^{n}(Y_i - Y_{(1)}) \sim \text{Gamma}(n-1, 1)$$

Suppose $Y_1, \dots, Y_n \overset{\text{iid}}{\sim} \text{Exp}(1)$. Show $\sum_{i=1}^{n}(Y_i - Y_{(1)}) \sim \text{Gamma}(n-1, 1)$

2 Answers2

Linked