2

I have a dataset of 12627 records (value and sd) and want to get the sum and overall uncertainty of the sum. I ran Monte Carlo analysis with 10000 simulations. I got the statistical result as below. You see the mean and 95%CI are very close to each other.

> upper     mean      lower 
> 8092.260  7850.384  7608.509

I plotted the 10000 simulation results of sums in a histogram, which looks like this. You see the tails stretched to -20,000 and 40,000. So I have two questions

  1. Does this histogram have any relationship with the confidence interval? if so, why do the two ranges differ so greatly?
  2. is it normal my 95%CI is so close to the mean?

enter image description here

The raw data look like this, you can download them from this Google Drive link

catch_mean  catch_std
0.0003  0.0018
0.0156  0.0356
0.0230  0.0694
0.0906  0.0999
0.1121  0.2553
0.6705  0.7395
0.0222  0.0518
0.0891  0.6350
0.0003  0.0007
0.0127  0.0437
0.0560  0.0615
0.0180  0.0411
0.0515  0.0565
0.0110  0.0380
...
(a total of 12627 records)

The R code for Monte Carlo simulation is below

library(gmodels)
library(Rmisc)

df <- read.csv('Monte carlo data.csv') dfsim<- data.frame(sim = double())

for (j in 1:10000) { #vector calculation, use rnorm to randomly choose 12627 numbers, and sum them up dfsim[nrow(dfsim)+1,]<- sum(matrix(rnorm(length(df$catch_mean),df$catch_mean,df$catch_std)))

}

CI(dfsim$sim) # calculate 95%CI plot(hist(dfallsim$sim)) #plot the histogram of simulated means

Elizabeth
  • 261
  • 1
    In the first few words of your question, what are these records and how does each have an SD? – Peter Flom Jan 06 '24 at 18:58
  • 1
    It would help if you added your code and an explanation of it. – Peter Flom Jan 06 '24 at 19:01
  • 3
    $7850 \pm (8092 - 7850)\sqrt{12000} = [-18700, +34400]$ matches the histogram well. In other words, your simulation appears to give the sample distribution of the sum while the CI is an interval covering the mean. If any of this seems obscure, please read our posts about the definition and meaning of confidence intervals. – whuber Jan 06 '24 at 20:28
  • @PeterFlom I have added the raw data, r code, and explanation as you requested. – Elizabeth Jan 07 '24 at 06:17
  • @whuber thank you so much for the answer and link provided. I am very new in statistics, I understand CI is simply a range that covers the mean and the range reduces with increasing sample size, what I do not understand is the calculation, what are [−18700,+34400], dose this interval have a name? – Elizabeth Jan 07 '24 at 08:56
  • @whuber, after reading some materials, I realised that I wrongly mixed the plots of confidence interval and raw data's histogram, I thought they were the same thing, but they are not. But I still haven't got the conversion you did. Could you please explain more? – Elizabeth Jan 07 '24 at 16:10
  • 1
    The standard error of the mean is $\sigma/\sqrt{n}$. It looks like whuber just multiplied the endpoints of the CI for the mean by $\sqrt{n}$ to get a corresponding interval containing a similar proportion of the original distribution (which works because the data are from a normal population) – Glen_b Jan 07 '24 at 18:51

0 Answers0