Given a (moderate) number of values which can be assumed to be normal distributed but with unknown mean and std-dev, I want to compute a confidence interval (say 99% confidence).
My naive (not a stats person) way to formalize this is I think the question:
Given the data: into which interval will a new value sampled from the unknown distribution that generated the data fall with 99% probability?
I think the accepted answer to this related question actually explains it well: Is it meaningful to calculate standard deviation of two numbers?
But the answer is not absolutely concrete in how to compute it. Doing some further research on my own confused me. It is easy to find formulas for confidence intervals on t-distributions that arise from sampled data:
Given samples $$X_1, X_2, \dotsc, X_n$$
The mean is $$\overline{X}=\frac 1n\sum_{i=1}^nX_i$$ and the sample variance $$S^2=\frac 1{n-1}\sum_{i=1}^n(X_i-\overline{X})^2$$
You get this confidence interval for the mean $$\overline{x}-t \cdot S/\sqrt{n} \leq \mu \leq \overline{x}+t \cdot S/\sqrt{n}$$
t comes from $$F_{n-1}(t)=0.995.$$
But this confidence interval goes to zero for large n. It is the confidence interval for how good the empirical mean approximates the true mean.
But that is not what I want. I think I want the equivalent of an empirical standard deviation (just I think I should base it on the t-distribution because my sample size is small).
Does that make sense? My intuition is to just drop the division by $\sqrt{n}$, but I have no idea if that is correct.
Any help appreciated.