2

If X ~ Exp(3), Y ~ Exp(1) and h = X / (X + Y) then h ~ beta(1/3, 1) and E(h) = 1/4.

But when I draw random deviates using the following R code, I find mean(h) ≈ 0.324 and the histogram doesn't resemble the beta distribution.

Am I making some dumb mistake?

fn <- function() {
  X <- rexp(n = 1, rate = 3)
  Y <- rexp(n = 1, rate = 1)
  h <- X / (X + Y)
  return(h)
}

h_vec <- replicate(1e6, fn()) mean(h_vec) # ≈ 0.324

rob
  • 21
  • Welcome to CV. You have confused the shape with the rate parameters and should be using Gamma distributions, as in h <- matrix(rgamma(1e6, c(1/3, 1)), 2); hist(h[1, ] / colSums(h), freq = FALSE, breaks = seq(0, 1, length.out = 100)); curve(dbeta(x, 1/3, 1), add = TRUE, n = 1201, lwd = 2) See https://en.wikipedia.org/wiki/Beta_distribution#Related_distributions – whuber Apr 05 '23 at 16:07
  • Yes! That's my dumb mistake. Thank you. Is there a simple way to find E(X / (X + Y)) when X and Y are exponentially distributed? – rob Apr 05 '23 at 16:33
  • Yes. You can compute the integral directly using elementary techniques. I obtain $$ab \iint_{[0,\infty)^2} \frac{x}{x+y}\exp(-a x - b y),\mathrm dx\mathrm dy = \frac{b \left(a \log \left(\frac{a}{b}\right)+b-a\right)}{(a-b)^2}.$$ With $a=3$ and $b=1$ the value is $(3\log(3)-2)/4\approx 0.323959.$ – whuber Apr 05 '23 at 17:30
  • Well this is embarassing. I should have just done that myself. Thank you for being so helpful! At least this post might save someone else from confusion. – rob Apr 06 '23 at 19:38

1 Answers1

1

There are two ways to interpret this question: (1) how to generate Beta variates as ratios $Z = X/(X+Y)$ of independent random variables $X$ and $Y$ and (2) what the distribution of such a ratio is when $X$ and $Y$ have Exponential distributions. Because (1) is well-known (use Gamma distributions), and because the second question is explicitly asked in the comment thread, I will address the second one here.

In the following I will exploit rules of differentiation (Chain, Product, Sum, and Quotient results) along with the fact that

$$\int_0^\infty e^{-\theta x}\,\mathrm dx = \frac{1}{\theta}\tag{*}$$

for any positive constant $\theta.$ This is almost all you will need to know of Calculus.


Recall that the Exponential distribution family describes positive random variables with densities expressible as

$$f(x,\alpha) = e^{-\alpha x}$$

for positive arguments $x.$ The parameter $\alpha \gt 0$ is usually called the rate. Its reciprocal $1/\alpha$ is a scale parameter. The survival function of this distribution is obtained by recalling the exponential function equals its derivative, whence

$$G(x,\alpha) = \Pr(X \gt x) = \alpha e^{-\alpha x}.$$

Given two rates $\alpha,\beta$ for $X$ and $Y$, the difficulties we face are (i) the denominator $X+Y$ has a hypoexponential distribution, not an Exponential one (unless $\alpha=\beta$); and (ii) the numerator $X$ and denominator $X+Y$ are not independent. But we can readily find the distribution of $Z$ directly by noting $0\le Z \le 1$ and choosing an arbitrary $\theta$ to compute the distribution function of $Z$ using $(*)$ at the last step:

$$\begin{aligned} F_Z(z;\alpha,\beta) = \Pr(Z\le z) &= \Pr\left(Y \ge \frac{1-z}{z}X\right) = E\left[G\left(\frac{1-z}{z}X, \beta\right)\right]\\ &= \int_0^\infty G\left(\frac{1-z}{z}X, \beta\right)\,f(x,\alpha)\,\mathrm dx\\ & = \int_0^\infty \beta e^{-\beta ((1-z)/z)x}\,\alpha e^{-\alpha x} \,\mathrm d x\\ &=\frac{\alpha\beta z}{\alpha z + (1-z)\beta}. \end{aligned}$$

This fully answers all questions about $Z.$ In particular,

  • The density of $Z$ on the interval $(0,1)$ is the derivative $$f_Z(z;\alpha,\beta) = \frac{\mathrm d}{\mathrm d z} F_Z(z;\alpha,\beta) = a \left(\frac{b}{a z + b(1-z)}\right)^2.$$

  • The expectation is the integral of $1-F_Z$ from $0$ to $1,$ equal to $$E[Z;\alpha,\beta] = \frac{\beta \left(\alpha \log \left(\frac{\alpha}{\beta}\right)+ \beta - \alpha \right)}{(\alpha -\beta )^2}.$$

(This calculation requires a bit more expertise in integral Calculus, but is straightforward.)

For the example $\alpha=3,$ $\beta=1$ I used R to generate a million iid realizations of independent $(X,Y),$ computed $Z$ for each of them, and plotted the histogram (gray) and the density $f_Z(z;3,1)$ on top of that:

n <- 1e6; a <- 3; b <- 1
x <- rexp(n, a); y <- rexp(n, b)
hist(x / (x + y), freq = FALSE); curve(a * (b / (b * (1-x) + a * x))^2, add = TRUE)

enter image description here

The agreement supports the correctness of this analysis.

BTW, it looks like the same approach will produce a closed-form solution when $X$ and $Y$ have any Gamma distributions, even with different shape parameters. The CDF will be a rational function of $z/(1-z).$

whuber
  • 322,774
  • A very nice explanation! Thanks again. Interpretation 2 was what interested me. – rob Apr 06 '23 at 19:41