In geostatistical context it is common practice that for simulations of a variable of interest (e.g., grade of concentrations of metal in rock samples) the number of simulations at least needs to be 30. I would ask what the criteria is to choose a right trial number considering that each trial has an expense of large computation. Is that criteria generic enough to be applied on all simulations?
-
This question is a little too broad and vague to be answerable. It would help to elaborate on the nature of your simulations, because (to date, after more than seven years) none of the answers recognizes what is special about geostatistical simulations. – whuber Oct 07 '18 at 12:19
4 Answers
If you're using the simulations to try to estimate something, then you'd seek a sufficient number of replicates to achieve the desired precision. It's hard to see how one could make any general statement about the minimal number. It depends on the variation among replicates. There are cases where 5 might be sufficient, and others where 100,000 are necessary.
- 6,197
-
2My guess would be that the number 30 comes from the old rule of thumb that you need at least 30 data points in order to do X. (Where I've seen X replaced with: "valid regression analysis", "to get a good estimate of variability", etc., or the point where you can switch from the t to the normal distribution.) – Wayne Sep 27 '11 at 14:57
-
-
If I'm not mixing things together I remember from statistical courses that bigger than 30 means large enough to satisfy that for example mean statistics of a sample is representative of population one's. I understand Karl' point is about stability/convergence that happens after n trials and could be a criteria for stopping the simulation. – Developer Sep 27 '11 at 17:18
-
1@Developer - But it does depend on both the variability of the results and on how precise you want the estimate. – Karl Sep 27 '11 at 17:21
You can use Wald's sequential probability ratio test (SPRT).
Suppose that your simulation is testing a null hypothesis---the primary null hypothesis. In testing the primary null, we use a level $p_0=0.05$ test.
The interesting part is recognizing that the p-value given by your simulations is itself random. So we want to form a test that this p-value is really less than the level of our test. This leads to the ancillary null hypothesis that the p-value of the primary test is greater than the desired level of that test---that is, we evaluate the null hypothesis that the true p-value of your simulations is greater than the level of your test based upon the sample of simulations that you have done. (The primary and ancillary language is, as far as I know, my own; I don't know that there are standard names for these things.)
The level for this ancillary test is chosen to be $\alpha=0.001$. That is, we want to incorrectly conclude that the p-value of the simulations is small 1 time in 1,000. We desire a power of $\beta=0.01$ in detecting a p-value of $p_1=0.045$ for the main hypothesis. That is, when the true p-value is 0.045, we want to incorrectly accept the null only 1 time in 100. (Of course, all these probabilities are suggestions for concreteness; you need to think about what values would be appropriate for your context.)
In the sequential procedure, we obtain a single observation and determine whether it is a "success" or a "failure." It is a success if it is more extreme than our critical value for hypothesis testing---it is a success if it is evidence against the null hypothesis. We count the number of successes after $m$ observations, thereby defining $T_m = \sum_{i=1}^m{X_i}$. Using this count, we calculate the probability ratio $$\begin{equation*} \frac{p_{1m}}{p_{0m}} \equiv \frac{p_1^{T_m}(1-p_1)^{1-T_m}}{p_0^{T_m}(1-p_0)^{1-T_m}}. \end{equation*}$$ This is the likelihood ratio of the p-value that we want power against and the level of the test of our main hypothesis.
Wald provides a stopping rule based upon this probability ratio. We conclude that the true p-value is below the level of our test ($p < p_0$) and reject our main null hypothesis if $$\begin{equation} \frac{p_{1m}}{p_{0m}} \geq \frac{1-\beta}{\alpha}. \end{equation}$$ We fail to reject our primary null hypothesis and conclude that $p > p_0$ if $$\begin{equation} \frac{p_{1m}}{p_{0m}} \leq \frac{\beta}{1-\alpha}. \label{lbound} \end{equation}$$ Otherwise, we collect an additional observation and recalculate these ratios.
Hence, this test gives a stopping rule when performing simulations that depends upon the level of your test ($p_0$), the p-value that you want power against ($p_1$), the amount of power you want against that alternative ($\beta$), and a measure of how certain you want to be in drawing conclusions from your simulations ($\alpha$).
The number of simulations required depends upon the true p-value of your simulations. The further this is from the level of your test, the fewer simulations that you'll need to perform. In simulations that I've done of this procedure, it took about 16,000 simulations to reach a conclusion when the true p-value was 0.05, 1,700 when the p-value was 0.01, and 800 when the true p-value was 0.10 using the values of the parameters that I gave above. If this is more than you can handle, you can change values of the parameters that I give ($\alpha$, for example).
Lastly, I'll just note that this is a non-parametric approach to finding the stopping point.
- 14,062
- 5
- 44
- 72
-
Well, I need to read deeply! As my first impression it looks promising. – Developer Sep 29 '11 at 05:05
It very much depends on what you're trying to simulate. We'd need more details in terms of your simulation. My answer, when it comes down to it, is "as many as your computer can handle in the time you have".
That admittedly isn't a great criteria. If you're trying to simulate a distribution or obtain an empirical confidence interval, my instinct is at least 10,000. For other questions, it's very different.
In terms of diagnostics for "have I done enough", I generally wait until two things have occurred:
- The distribution of the random variable being simulated begins to resemble the distribution its being drawn from. Until that happens, you haven't really had a chance to fully explore the potential of the simulation.
- As you mention in your comment, until the change between each new realization drops to 0. However, I don't do this by "If I add another, what happens", but instead grossly overshoot my gut feeling, then trim backwards if its clear it was unneeded in future simulations. I don't want to say "adding 1 didn't make a difference" in case it was drawn from near the mean anyway (a very likely scenario). I'm far more comfortable saying "Adding 1,000 didn't make a difference".
- 23,134
-
Suppose I did simulation 30 times and made E-Type evaluation e.g., mean of all realizations (E-Type1). If I do one more and try to find again E-Type model (E-Type2), if there was no noticeable difference between E-Type1 and E-Type2 I would think that's a point to stop simulation. – Developer Sep 27 '11 at 17:27
-
Not necessarily - what is your simulation changes dramatically if the result is say, 1.5 standard deviations from the mean? A likely answer, but not one that might have shown up in 30 realizations- also, what do you mean by "noticeable". I've put some more detail in my answer. – Fomite Sep 28 '11 at 05:26
-
There are interesting points in your answer. One to avoid stopping as only one (or a few) trials could not add noticeable changes in the output (the variable of interest). I think this is a good point. – Developer Sep 28 '11 at 12:49
People finding this question may find this article useful. It goes into some detail on how to calculate number of simulations, as well as other facets of setting up a simulation study.
It mentions that you can calculate the size B with: $$B = (\frac{Z_{1-(\alpha/2)\sigma}}{\delta})^2$$ where $Z_{1-(\alpha/2)}$ is the $ 1-(\alpha/2)$ quantile of the standard normal distribution and $\sigma^2 $ is the variance for the parameter of interest. $\delta$ is here the the specified level of accuracy that one wants to achieve.
Just as a whole the article is very useful in reviewing the quality of your design.
Burton, A. , Altman, D. G., Royston, P. and Holder, R. L. (2006), The design of simulation studies in medical statistics. Statist. Med., 25: 4279-4292. doi:10.1002/sim.2673
- 81
-
Thank you. Although the idea is generically correct (at least as an approximation), your formula for $B$ obviously is not, perhaps because of typographical errors (and omission of a description of $\delta$ and $\sigma$)? Could you fix those problems? – whuber Oct 07 '18 at 12:17
-
Thank you for noting the incorrectness of my reply. I checked my formula with the one given in the source corrected accordingly. The descriptions of the formula are as stated in the source. – SK4ndal Oct 07 '18 at 14:12