2

Suppose that the random variable $Y$ follows a mixture of two exponential distributions, that is

\begin{equation} f_Y(y) = \sum_{i=1}^{2}\pi_i f(y| \lambda_i) \end{equation} where $\pi$ stands for mixing weights with the property that $\sum_{i=1}^{2}\pi_i = 1$, and that $f(y|\lambda_i) = \lambda_i e^{-\lambda_i y}$, for $i=1, 2$. Let $\{Y_1, Y_2, \cdots, Y_n\}$ be a random sample from the above-mentioned mixture distribution. Now consider the new random variable $Z$, which is defined to be
\begin{equation} Z= \sum_{j=1}^{n}Y_j \end{equation} Is it possible to find the density function of $Z$?

Xi'an
  • 105,342

1 Answers1

5

This derivation is directly feasible by considering the latent variable representation of a mixture random variable. Each of the $Y_i$ is associated with a latent Bernoulli variable $\xi_i\sim\mathfrak B(\pi_2)$ in the sense that $$Y_i|\xi_i=k\sim\mathfrak Exp(\lambda_{k+1})\qquad k=0,1$$ Therefore, $$Z=\sum_{i=1}^n Y_i$$ can be conditioned upon the vector of latent variables$$\boldsymbol\xi=(\xi_1,\ldots,\xi_n)$$and written as $$ Z|\boldsymbol\xi \sim\sum_{i=1}^n Y_i\big|\boldsymbol\xi \sim\Big(\underbrace{\sum_{i;\,\xi_i=0} Y_i}_{\substack{\text{sum of iid}\\ \mathfrak Exp(\lambda_1)}}+\underbrace{\sum_{i;\,\xi_i=1} Y_i}_{\substack{\text{sum of iid}\\ \mathfrak Exp(\lambda_2)}}\Big)\Big|\boldsymbol\xi $$ This means that, conditional on$$\zeta=\sum_{i=1}^n\xi_i\sim\mathfrak B(n,\pi_2)$$$Z$ is distributed as the sum of a Gamma $\mathfrak G(n-\zeta,\lambda_1)$ and of a Gamma $\mathfrak G(\zeta,\lambda_2)$, i.e., $$ Z|\zeta\sim Z_1+Z_2\qquad Z_1\sim \mathfrak G(n-\zeta,\lambda_1),\ \ Z_2\sim\mathfrak G(\zeta,\lambda_2)\tag{1} $$ The distribution of this sum (1) is itself a signed mixture of Gamma distributions with at most $n$ terms and rates either $\lambda_1$ or $\lambda_2$, as shown in the earlier X validated post of @whuber.¹ Integrating out $\zeta$ (or marginalising in $Z$) leads to a mixture of $n+1$ terms, the weight of the $k$-th term is the Binomial probability$${n\choose k}\pi_2^k\pi_1^{n-k}$$ In conclusion, the distribution of $Z$ can be represented as a signed mixture of Gamma distributions with an order $O(n^2)$ terms.

A more direct approach is to consider the $n$-fold convolution representation of the density of $Z$:

$$f_Z(z) = \int_{\mathbb R^{n-1}} \prod_{i=1}^{n-1} f_Y(y_i) f_Y(z-y_1-\cdots-y_{n-1})\,\text dy_1\cdots\,\text dy_{n-1}$$

and to expand the product of the $n$ sums $f_Y(y_i)=\pi_1 \mathfrak e(y_i|\lambda_1)+\pi_2 \mathfrak e(y_i|\lambda_2)$ into $2^n$ terms, which when regrouping identical convolution integrals again results into a sum of $n+1$ terms,

$$f_Z(z) =\sum_{k=0}^n {n\choose k}\pi_1^k\pi_2^{n-k}\int_{\mathbb R^{n-1}} \underbrace{\prod_{i=1}^k \mathfrak e(y_i|\lambda_1)}_{\substack{\text{leading to}\\ \mathfrak G(k,\lambda_1)}}\,\underbrace{\prod_{i=k+1}^n \mathfrak e(y_i|\lambda_2)}_{\substack{\text{leading to}\\ \mathfrak G(n-k,\lambda_2)}}\,\text dy_1\cdots\,\text dy_{n-1}$$

where $y_n=z-y_1-\cdots-y_{n-1}$.

The most compact representation for the density is thus $$\sum_{k=0}^{n}\binom{n}{k}\pi_2^k \pi_1^{n-k}\dfrac{\lambda_1^{n-k}\lambda_2^{k}}{\Gamma(n)}e^{-\lambda_1z} z^{n-1}\; _1F_1(k, n, (\lambda_1-\lambda_2)z)$$


¹Or equivalently a distribution with a more complex density involving a confluent hypergeometric function $_1F_1$ as shown in the earlier CV post of @Carl.

Xi'an
  • 105,342
  • Thank you very much for your reply. I should go through your answer as it is not easy foe me to fully understand why, for example, $Z$ conditional on $\zeta$ is distributed as the sum of two gamma distributions. BTW, if we have this conditional distribution, it is still possible to find the distribution of $Z$? I think you have derived the conditional distribution of $Z$ given $\zeta$. – Statistics Apr 01 '22 at 08:36
  • 1
    Thank you. Can I ask why you first consider $Z|\xi$ and then work with $Z|\zeta$? – Statistics Apr 01 '22 at 08:56
  • can we proceed further with the last formula you wrote in order to derive a closed-form expression? – Statistics Apr 01 '22 at 09:46
  • Ok, I see. So, the final result is represented by a signed mixture of Gamma distributions, can I ask how this kind of distribution looks like. You said that I can refer to X validated post of @whube. It seems that a solution based on the confluent hypergeometric function exists. Is this called a signed mixture of Gamma distributions? – Statistics Apr 01 '22 at 10:41
  • But I think this kind of representation is applicable when we have gamma distributions with the same $\lambda$. BTW, I want to write down the distribution function of $Z$ to compute a quantity that is based on it. – Statistics Apr 01 '22 at 11:03
  • 1
    So, this is what we will have, according to your explanation: \begin{equation} f_Z(z) = \sum_{k=0}^{n} f(z, \zeta) = \sum_{k=0}^{n}f(z|\zeta) f(\zeta =k) \ \sum_{k=0}^{n}\binom{n}{k}\pi_2^k \pi_1^{n-k}\frac{\lambda_1^{n-k}\lambda_2^{k}}{\Gamma(n)}e^{-\lambda_1z} z^{n-1}; 1F_1(k, n, (\lambda_1-\lambda_2)z) \end{equation} – Statistics Apr 01 '22 at 12:33
  • sorry, we can say that the above representation is a signed mixture of gamma, right? – Statistics Apr 01 '22 at 13:07
  • Computationally, it is doable to calculate the $F_Z(y)=\mathbb{P}(Z\leq y)$ according to the above representation? – Statistics Apr 01 '22 at 14:36
  • I greatly appreciate your time and help. – Statistics Apr 01 '22 at 15:01
  • I have a question. At the beginning of your proof, you have considered latent variable $\xi_i$ to be the Bernoulli variable with parameter $\pi_2$, which is interpreted as the probability of success. Since we usually write $Y\sim \pi \textbf{EXP}(\lambda_1) + (1-\pi)\textbf{EXP}(\lambda_2)$, maybe it is better to write $\xi_i \sim \mathcal{B}(\pi_1)$. – Statistics Apr 06 '22 at 11:12