4

Suppose I have pairs of random variables where $X_i$~$U[0,1]$ and $Y_i$~$U[0,1]$ and I want to estimate it $$\theta=\int_{0.5}^{1}\int_0^{0.5}e^{xy}xydxdy$$

but $\theta$ needs to have variance less than $0.001$, and from what I understand through a book of examples, I would need a variance type $\frac{c}{n}$ where $c<0.001$ I do not know if that's possible, but exercise does not specify how large n can be.

  • 1
    What does "$\phi$" mean? – whuber May 07 '15 at 16:21
  • @whuber It's just a representation that I adopted for $xy$ –  May 07 '15 at 16:28
  • 1
    I'm afraid that literally makes no sense, because upon substituting your "representation" you are asserting the MC estimator is "$\frac{1}{n}\sum xy$". – whuber May 07 '15 at 17:01
  • @whuber Please take a look again. –  May 07 '15 at 19:48
  • 2
    The implementation is simple: N <- 10^6; x <- runif(N); y <- runif(N); f <- function(x,y) exp(x*y)*x*y * (x < 0.5) * (y > 0.5); mean(f(x,y)); sqrt(var(f(x,y))/N) – Zen May 07 '15 at 19:52
  • But what exactly you don't understand? – Zen May 07 '15 at 19:52
  • @Zen You get the less variance than 0001 using a sample size of $10 ^ 6$, but it is possible to achieve a smaller variance than 0.001 without depending on the sample size? –  May 07 '15 at 20:14
  • 2
    Doesn't make sense. Could you show the verbatim statement of your problem? – Zen May 07 '15 at 20:21
  • @Zen This is a homework, the problem is that exercise does not specify the size of n, it just asks to estimate $\theta$ with variance less than 0.001,and asks to write the variance of the estimator as a function of n. –  May 07 '15 at 20:28
  • 3
    Let $f(x,y)=e^{xy}xy,I_{[0,0.5]\times[0.5,1]}(x,y)$. Let ${X_i}{i=1}^n$ and ${Y_i}{i=1}^n$ be IID $\mathrm{U}[0,1]$. Your estimator is $\hat{I}n=\frac{1}{n}\sum{i=1}^n f(X_i,Y_i)$. Can you find an expression for $\mathrm{Var}[\hat{I}_n]$ using this information? If not, your difficulty is not related to the Monte Carlo method. You have to learn first how to compute the variance of a sum of independent random variables. – Zen May 07 '15 at 22:39
  • @Zen I do not know if there is any difference in the computational point of view, but what I learned is: If $X_1,...,X_n$ are iid then $Var(\frac{1}{n}\sum X_i)=\frac{1}{n^2}\sum Var(X_i)=\frac{1}{n} Var(X_1)$ where the last equality comes from the fact that they are iid –  May 10 '15 at 13:01

2 Answers2

7

From the way the question is presently phrased, it appears the objective is to estimate

$$\theta = \int_{1/2}^1 \int_0^{1/2} e^{xy} xy dx dy$$

using nothing more than the ability to generate independent uniform variates and some arithmetic. If we let $X$ have a Uniform$(1/2,1)$ distribution and $Y$ independently have a Uniform$(0,1/2)$ distribution, then this integral is seen to equal

$$\theta = \frac{1}{4}\mathbb{E}(XYe^{XY}).$$

Inspired by your question at Control Variates, Monte Carlo integration, we might look for a simplification of the function $u(X,Y) = XY e^{XY}$ whose expectation is easy to compute exactly. Eliminating the exponential for the time being, consider

$$\tau = \frac{1}{4}\mathbb{E}(XY) = \frac{1}{4}\mathbb{E}(X)\mathbb{E}(Y).$$

That's dead easy to compute, since (by symmetry) $\mathbb{E}(X) = 3/4$ and $\mathbb{E}(Y)=1/4$, whence

$$\tau = \frac{1}{4}\left(\frac{3}{4}\right)\left(\frac{1}{4}\right) = \frac{3}{64}.$$

Consider, then, $n$ independent realizations of $(X,Y)$ indexed by $i=1,2,\ldots,n$ For each of them we may (easily) compute both $T_i = X_i Y_i$ and $U_i = u(X_i,Y_i)$. Look at the statistics

$$\bar U = \frac{1}{n}\sum_i U_i,\ \bar T = \frac{1}{n}\sum_i T_i.$$

We have already determined that $\mathbb{E}(\bar T) = 4\tau = 3/16$. Then, according to the argument in the Wikipedia article on Control Variates, for any real constant $c$

$$\theta = \frac{1}{4}\mathbb{E}(u(X,Y)) = \frac{1}{4}\mathbb{E}( \bar U) = \frac{1}{4}\mathbb{E}\left(\bar U - c\left(\bar T - \frac{3}{16}\right)\right).$$

The value of $c$ that minimizes the variance of the right hand side is

$$c^{*} = \frac{\text{Cov}(\bar U, \bar T)}{\text{Var}(\bar T)} = \frac{\text{Cov}(U, T)}{\text{Var}( T)}.$$

The numerator and denominator can both be estimated from the sample to approximate $c^{*}$. Although this value might not be optimal, for all but extremely small $n$ it should be close to optimal and thereby allow a (substantial) variance reduction.

Finally, how large must $n$ be to achieve a standard error (square root of the estimation variance) that is equal to or better some target value? We expect the variance of the estimator to be proportional to $1/n$ in the worst case, . Thus, we could perform estimates for small values of $n$, fit a linear function of these values against $1/n$, and predict the value of $n$ that will result in a standard error equal to the target. We then use that value of $n$ and see whether it works. If not, we improve our guess by including this new datum in the fit and keep trying until we get it.

The example in the question requires only two samples to achieve a standard error of $\sqrt{0.001}$. Here, for instance, is a histogram of $10,000$ independent estimates using just two $(X,Y)$ samples each. Its variance is $0.00002$, far better than the target of $0.001$. (It actually is biased a little low because $n=2$ is so small: the true value of the integral is shown by the vertical red line.)

Figure

whuber
  • 322,774
  • You applied a variance reduction technique, right? I imagine that's what you have to do, my only doubt was whether it was possible to achieve such variance applying simple monte carlo. –  May 07 '15 at 22:04
  • 1
    Of course it is possible, because in the limit of large $n$ the MC estimate will be exact. Regardless, this answer outlines how you can approach the problem regardless of the method used to estimate the integral. And why not use a variance reduction technique you have already learned? – whuber May 07 '15 at 22:10
  • I'm sorry, that expression makes no sense to me. – whuber May 07 '15 at 22:21
  • The variance of an estimator is computed as $var(\hat{\theta})\approx \frac{\sigma}{\sqrt{n}}$ right? I cannot do this $var(\hat{\theta})\approx \frac{\sigma^2}{n}$? –  May 07 '15 at 22:25
1

$\int_{0.5}^1\int_0^{0.5} xy\,dx\,dy\approx \frac{1}{n}\sum_{i=1}^n x_i y_iI(x_i<0.5)\cdot I(y_i>0.5)$ where $I$ is the indicator function. That is, $\phi(x, y) = x yI(x<0.5)\cdot I(y>0.5)$. In the denominator there is an $n$ since you have $n$ observations of the variable $\phi(X, Y)$.

If the variables are pairwise independent then take $Z_i = X_i/2$ and $W_i = Y_i/2 + 0.5$. Then

$\mathbb{E}[Z_iW_i] = \mathbb{E}[Z_i]\mathbb{E}[W_i] = 4\int_0^{0.5} z\,dz\int_{0.5}^1 w\,dw = 4\int_{0.5}^1\int_0^{0.5} zw\,dz\,dw$ so $\int_{0.5}^1\int_0^{0.5} xy\,dx\,dy = \frac{1}{4}\mathbb{E}[Z_i]\mathbb{E}[W_i]$.

Edit: Since my post the question has changed!

Hunaphu
  • 2,212
  • I don't understand this $Z_i = X_i/2$ and $W_i = Y_i/2 + 0.5$ –  May 07 '15 at 14:44
  • Then $Z$ is uniform on $[0, 0.5]$ and $W$ on $[0.5, 1]$. Then the expectation of $Z$ is $2\int_{[0, 0.5]} z,dz$. But the first part (with $\phi$) is sufficient and in most cases better! – Hunaphu May 08 '15 at 15:09