4

This question describes a method to calculate the number of Monte Carlo simulation runs required. Another method checks the convergence of the mean of a particular output variable. Both of these methods focus on the output variables without regard to the number or variance of the input variables.

In general, will the number of Monte Carlo simulation runs need to increase if the number of varying input variables/parameters is also increased?

Or, does number of required runs only relate to increasing or decreasing variance on the input variables?

Patrick
  • 53
  • 1
    Try to make the Q much more specific. What do you want to simulate, for what purpose? Tell us, then maybe we can help. – kjetil b halvorsen Sep 23 '16 at 17:23
  • It could be how much fuel is left after running an obstacle course where variables to be perturbed are weight, traction, ability to decelerate, etc. I'm questioning, if I add more variables (drag, weather, power capacity) does this increase the number of runs needed to converge on final answer? Or does number of runs required only increase if the variability of inputs increases, such as considering a larger weight range, larger range in power capacity? I'm not sure how what I'm simulating relates to the question? It could just as easily be a financial model, a robotic car, plant growth? – Patrick Sep 28 '16 at 17:03
  • I cannot find an answer to this question either, everything I read says no, but intuitively it seems that at least in the case of discrete variables the size of the sample space is a function of the number of variables and so the number of simulations required to achieve the same accuracy should be also so. – Anton Belov Sep 23 '20 at 13:46

1 Answers1

2

Monte Carlo estimate of a random variable $X$ with a sample of size $N$ has variance of $\tfrac{\text{Var}[X]}{N}$ and thus a "typical error" of this estimate has an order of $\tfrac{\text{std}(X)}{\sqrt{N}}$. If you want to estimate $\mathbb{E} X$ with a high degree of certainty, you should make this variance as small as possible, which means $N$ has to scale with the variance $\text{Var}[X]$. It should also be noted that it's the variance of the outcome that matters, not the individual variances of the random variables involved (although they do influence the outcome, but how they do it is up to you).

So what happens if we consider some $f(X, Y)$ that has the same expectation $\mathbb{E} X = \mathbb{E} f(X, Y)$ but employs auxiliary variables $Y$? What would happen to the variance $\text{Var}[f(X,Y)]$?

First, a major result here is the Rao–Blackwell theorem , which states essentially that $$\text{Var}[\mathbb{E}_{Y|X} f(X, Y)] \le \text{Var}[f(X, Y)]$$ That is, if you can average the auxiliary variable $Y$ (conditioned on $X$) out, it might only improve (i.e. decrease) the variance.

However, the RB theorem only applies to cases where $X = \mathbb{E}_{Y|X} f(X,Y)$, for example, $f(X,Y) = X + Y$ for $Y|X \sim \mathcal{N}(0, 1)$. Then, indeed, the expectation is the same, but adding more noise won't decrease the variance. But the Monte-Carlo itself is a demonstration of employing more random variables to reduce the variance: $f(X, Y) = \frac{X+Y}{2}$ for $X$ and $Y$ being independent and identically distributed. How did we escape the RB theorem? Its because $X$ is not equal to $\mathbb{E}_{Y|X} f(X, Y)$, which is actually $\frac{X}{2} + \frac{\mathbb{E} X}{2}$. The later would indeed be a better estimate of the mean, but we can't compute $\mathbb{E}_{Y|X} f(X, Y)$ in this example.

Therefore, we can see that constructing unbiased estimators with more random variables and smaller variance is possible, the RB theorem just points out some cases where such attempts would fail.

Okay, how do you actually construct such estimators? Besides the Monte Carlo method itself (which uses more computation to reduce the variance), one way that comes to mind is the antithetic variates method, which uses negative correlation so that randomness in $X$ and $Y$ "cancel" each other out (essentially, the method exploits problem structure, but it's not always possible/easy to find and utilize it).

In general, the idea of Variance Reduction in Monte Carlo simulations is a very important one, and there are other methods beyond increasing sample size or introducing extra random variable.

  • I am correct thinking the argument breaks down for discrete random variables (rv) ? E.g. if the domain of rvs $X$ and $Y$ is of size $N$, surely $N$ samples is enough to know a function of $X$ precisely, but not a function of $X$ and $Y$. – Anton Belov Sep 30 '20 at 13:43
  • 1
    @AntonBelov, surprisingly, no – the argument holds for discrete random variables as well. And, indeed, even after taking N samples of X the Monte Carlo estimate $1/N \sum_{n=1}^N X_n$ is still a random variable with non-zero variance (think of coin toss: you flip it twice and average the outcomes (assuming head = 1, tail = 0). Can you now know the probability of heads precisely?). In this case one, of course, should abandon MC estimates and just do direct enumeration. – Artem Sobolev Sep 30 '20 at 15:54
  • excellent, I see now where I got confused. Thank you ! – Anton Belov Sep 30 '20 at 16:18