0

I have a simulation that returns "yes" or "no" for each iteration, and I measure the average number of "hits" over many iterations to estimate the likelihood of "yes" occurring. I'd like to be able to say how accurate/trustworthy that estimate of the likelihood is, though.

Monte Carlo integration explains a similar basic Monte Carlo simulation:

Imagine that we want to measure the area of a pond with arbitrary shape. Suppose that this pond is in the middle of a field with known area $A$. If we throw $N$ stones randomly, such that they land within the boundaries of the field, and we count the number of stones that fall in the pond $N_{in}$, the area of the pond will be approximately proportional to the fraction of stones that make a splash, multiplied by $A$:

$$A_{pond}=\frac{N_{in}}{N}A.$$

That page then talks about a similar scenario, where you do integration of a function by checking whether random points are under it or not:

imagine a rectangle of height $H$ in the integration interval $[a,b]$, such that the function $f(x)$ is within its boundaries. … The fraction of points that fall within the area contained below $f(x)$ … is an estimate of the ratio of the integral of $f(x)$ and the area of the rectangle.

But then it switches to a different method of integration, where instead of checking yes/no for whether your test points falls under the curve, you instead evaluate the curve itself:

Another Monte Carlo procedure is based on the definition: $$\langle g \rangle=\frac{1}{(b-a)} \int_a^b{f(x)dx}.$$

In order to determine this average, we sample the value of $f(x)$:

$$\langle f \rangle \simeq \frac{1}{N}\sum_{i=1}^{N}f(x_i),$$

And then it tells how to calculate the variance for that type of method:

A possible measure of the error is the "variance" $\sigma^2$ defined by: $$\sigma ^2=\langle f^2 \rangle - \langle f \rangle ^2, $$

But I don't understand how to take this variance calculation and transfer it back to the original "number of hits" type of calculation. I'm not sure what function $f$ I'm evaluating or how to evaluate it.

endolith
  • 595
  • 1
    I am unable to match your description to the page you link to. Regardless, this is a Bernoulli experiment and the errors are exactly described by Binomial distributions. You can read a huge amount about this situation on our site. With this search I found some details at https://stats.stackexchange.com/questions/151163: would this answer your question? – whuber May 24 '22 at 20:17
  • @whuber No, I don't understand that question or how it applies to mine. :/ – endolith May 24 '22 at 21:42
  • @whuber This result from your search seems applicable though https://stats.stackexchange.com/a/71228/11633 – endolith May 24 '22 at 21:48
  • That's exactly the same answer, but with a more direct focus on the estimation variance. – whuber May 24 '22 at 22:05
  • @whuber Well I've read both of those links carefully and I still don't know the answer to my question. Both seem to be about "How many trials do I need to do to reach a specific variance?", while I would like to know "How can I calculate the variance of my estimate after I've done N trials?" If I've done N trials and M of them are "yes", then I can say that the probability of "yes" is M/N, but how accurate is that estimate? – endolith May 25 '22 at 00:19
  • Both answers contain explicit calculations of that variance. – whuber May 25 '22 at 12:36
  • @whuber Do you mean $$\widehat{\text{Var}}(\hat\theta) = \frac{\hat\theta(A-\hat\theta)}{n}$$? So for the pond example, the estimated variance would be $$\frac{N_{in}(A-N_{in})}{N}$$? – endolith May 25 '22 at 21:01
  • 1
    That's incorrect: $A$ is an area while the $N_{in}$ and $N$ are counts: you can hardly subtract a count from an area! The estimator of the area is $\hat\theta = AN_{in}/N,$ which does make sense as a fraction of an area. – whuber May 25 '22 at 21:08
  • @whuber Ah, I misinterpreted the text. Good point about incompatible units. OK... Let me try once more: "The estimated variance of the estimated area $\widehat{A_{pond}}$ is given by

    $$\widehat{\operatorname{Var}}(\widehat{A_\mathrm{pond}}) = \frac{\frac{N_\mathrm{in}}{N}A(A - \frac{N_\mathrm{in}}{N}A)}{N}=\frac{N_\mathrm{in}A^2(1-\frac{N_\mathrm{in}}{N})}{N^2}$$

    Do I finally get it?

    – endolith May 25 '22 at 21:37
  • 1
    That looks correct. In fact, a suggestive way to write your formula is the form $$A^2\left(\frac{N_{in}}{N}\right)\left(1 - \frac{N_{in}}{N}\right)/N$$ because the factors have clear interpretations: $A^2$ arises by using the base area $A$ as the unit of measurement; $1/N$ arises in the usual way from an average of $N$ independent observations; and the rest has the familiar form $(\hat p)(1-\hat p)$ for the variance of a Bernoulli$(\hat p)$ variable. In short, this is the usual formula for the (squared) standard error of an estimate of $\theta$ for Binomial sampling. – whuber May 26 '22 at 13:32
  • Turns out I already asked this https://stats.stackexchange.com/q/430409/11633 Too bad I can't post the correct answer here now that I finally understand it. :/ https://gist.github.com/endolith/9e73ea30511f0befb9b336526f34d0a6 – endolith Jul 10 '22 at 19:16

0 Answers0