How to compute empirical confidence intervals correctly?

Question

Given a (moderate) number of values which can be assumed to be normal distributed but with unknown mean and std-dev, I want to compute a confidence interval (say 99% confidence).

My naive (not a stats person) way to formalize this is I think the question:

Given the data: into which interval will a new value sampled from the unknown distribution that generated the data fall with 99% probability?

I think the accepted answer to this related question actually explains it well: Is it meaningful to calculate standard deviation of two numbers?

But the answer is not absolutely concrete in how to compute it. Doing some further research on my own confused me. It is easy to find formulas for confidence intervals on t-distributions that arise from sampled data:

Given samples $$X_1, X_2, \dotsc, X_n$$

The mean is $$\overline{X}=\frac 1n\sum_{i=1}^nX_i$$ and the sample variance $$S^2=\frac 1{n-1}\sum_{i=1}^n(X_i-\overline{X})^2$$

You get this confidence interval for the mean $$\overline{x}-t \cdot S/\sqrt{n} \leq \mu \leq \overline{x}+t \cdot S/\sqrt{n}$$

t comes from $$F_{n-1}(t)=0.995.$$

But this confidence interval goes to zero for large n. It is the confidence interval for how good the empirical mean approximates the true mean.

But that is not what I want. I think I want the equivalent of an empirical standard deviation (just I think I should base it on the t-distribution because my sample size is small).

Does that make sense? My intuition is to just drop the division by $\sqrt{n}$, but I have no idea if that is correct.

Any help appreciated.

Hi Kat, welcome to the site! You're intuition is really close! It's just that your looking for what's called a prediction interval, rather than a confidence interval (technically, it's still a CI too, but for Googling "prediction interval" will be much more fruitful). — John Madden, Jul 28 '22 at 16:52
https://stats.stackexchange.com/questions/16493 and https://stats.stackexchange.com/questions/17773 might help. — whuber, Jul 28 '22 at 17:32
Hi John, thanks, that brought me on the right track. The formulas can be found at e.g. https://en.wikipedia.org/wiki/Prediction_interval under "Unknown mean, unknown variance". If you want to redact that into an answer I will accept it (or I can do it myself). — Kat Branchman, Jul 29 '22 at 08:53

How to compute empirical confidence intervals correctly?

0 Answers0