0

I am very puzzled by the two 95%CI results generated from the same normal-distributed dataset from the Monte Carlo method with 10000 simulations. See below figure and statistics of the Monte Carlo result, simulated results are downloadable here.

sd        mean      n
12325.37  7993.051  10000

enter image description here

Approach 1 95% Confidence interval = mean ± 1.96*SD/sqrt(n), the result is

   upper     mean    lower 
8234.653 7993.051 7751.449 

Approach 2 2.5% and 97.5% percentile. Many say if the sample size is large enough, the percentile range is equivalent to 95%CI range.(for instance, this paper says "The 2.5% and 97.5% percentiles of the calculated risk are taken as the overall uncertainty range (i.e., 95% confidence interval).") but the result below differs greatly from that of approach 1.

2.5%       50%     97.5% 
-16195.35   7932.73  32429.37 

so I have two questions:

  1. why do these two results differ so greatly? and which one is the real CI?
  2. why is it true when the sample size is large enough, the 2.5%-97.5% quantile range is equivalent to the 95%CI range? I am confused because, on one hand, the quantiles do not consider sample size. On the other hand, when the sample size is large enough, wouldn't the equation change to "95% Confidence interval = mean ± a very small margin of error"? the very small margin of error does not stretch to the two tails at 2.5% and 97.5%.

Any help would be much appreciated!

This is a follow-up question after this question and this question.

Elizabeth
  • 261
  • 1
    The issue is nearly same one as at your earlier question -- https://stats.stackexchange.com/questions/636313/i-am-confused-about-the-histogram-distribution-and-confidence-interval (i.e. a difference between an interval that aims to encompass some fraction of the parent distribution and an interval for its population mean). No doubt you can find several further ways to generate such intervals in relation to a parent distribution, but they'll all have the same sort of disparity from an interval for the mean. – Glen_b Jan 08 '24 at 11:39
  • @Glen_b Thank you for your comment. Yes, the two questions are alike. I was almost convinced that the two approaches are different, but later I read many posts claiming these two approaches are interchangeable when meeting some criteria(stats.stackexchange.com/questions/246435/…), and this paper used 2.5% and 97.5% percentiles to represent 95% CI https://www.nature.com/articles/s41467-021-23391-7, so I asked again, deleted all the irrelevant information, I'd like to know what is the correct way to calculate 95% CI, and what are the differences between the two approaches. – Elizabeth Jan 08 '24 at 13:53
  • "what is the correct way to calculate 95% CI" ... A 95% interval for what, exactly? If you calculate intervals for very different things, naturally they are not alike. – Glen_b Jan 08 '24 at 15:44

1 Answers1

5

The two approaches are not the same at all! The first one computes the 1 SD ($\approx 68 \%$) CI for the mean, while your second approach estimates the 2.5th and 97.5th percentile of your data, so it's not surprising at all that these two quantities differ. Confidence intervals usually refer to some statistic, usually the mean, so your second approach is likely irrelevant.

user2974951
  • 7,813
  • thank you for your answer, but how come some posts indicate CI could also be calculated from 2.5% and 97.5% percentile?https://stats.stackexchange.com/questions/246435/confidence-interval-from-bootstrap and there are many papers published using 2.5% and 97.5% percentile as 95% confidence interval? (e.g., https://www.nature.com/articles/s41467-021-23391-7, see the second last sentence in the section of Uncertainty Analysis. – Elizabeth Jan 08 '24 at 13:42
  • @Elizabeth that's using bootstrapping, you are not using that in your post. – user2974951 Jan 08 '24 at 13:55
  • The percentile you are talking about are derived from the bootstrap distribution. The bootsrap is a resampling method that allows you to derive many statistical "objects" among which confidence intervals. – lulufofo Jan 08 '24 at 13:55
  • @lulufofo Sorry, the "raw data" here are actually the output from the Monte Carlo simulation. Not sure if this information would help? I think at least my method of "Monte Carlo 10000 simulation + approach 2" is the same method used in the paper (nature.com/articles/s41467-021-23391-7) I just mentioned, which uses 2.5% and 97.5% percentile of the monte carlo results as 95% confidence interval – Elizabeth Jan 08 '24 at 14:13
  • @Elisabeth, sorry I do not really understand what you did with the Monte Carlo simulation, can you explain please :) ? The thing is that the 95% CI you mentioned constructed with the SD is, as user2974951 said, a confidence interval on your estimate of the mean, while the percentile you gave are the percentile of your "empirical" distribution of your data. The percentile from bootstrap (cited before) gives you a CI on the estimate of the mean. – lulufofo Jan 08 '24 at 14:24
  • @lulufofo thank you for your comment, I might not express my case clearly, but I found a similar case here, https://stats.stackexchange.com/a/148570/143272. This answer covered what I did and my confusion. I did Monte Carlo simulation for some equations (not important) and got 10000 simulated results, I want to get the 95%CI of the simulated results, and there are two ways as the author indicated: 1) 1.96 * SE intervals and 2) 2.5% and 97.5% percentiles. My confusion is how come 1) and 2) are comparable. When in large sample size 1) will be very small but 2) doesn't get affected at all. – Elizabeth Jan 08 '24 at 17:40