Confused about confidence interval

Question

I am confused about the concept of confidence interval. Specifically, assume there is a Gaussian variable $X \sim N(\mu, \sigma)$ with $\sigma$ known, and I am interested in the lower-bound $\mu_L$ of the mean with $95\%$ confidence level.

I will do the experiment for $5$ times, and observe $X_1$, $X_2$, $X_3$, $X_4$, $X_5$.

Option 1: I treat each sample separately, and I can compute $\mu_L = X_i - \sigma z$ for each $X_i$. And then I guess there is some way (I don't know how) to compute the actual lower bound from these 5 $\mu_L$'s.

Option 2: On the other hand, if I take $T = (X_1+X_2+X_3+X_4+X_5)/5$, I can compute $\mu_L = T - \sigma/\sqrt{5}z$. (assuming $T$ is normal, we can use t-stat too.)

Is there any method other than option 2 to compute a lower-bound based on the $5$ samples? And for option 1, is there a way to compute the lower-bound based on the 5 lower-bounds computed?

whuber · Accepted Answer · 2012-09-08T22:28:07.263

This is a great question because it explores the possibility of alternative procedures and asks us to think about why and how one procedure might be superior to another.

The short answer is that there are infinitely many ways we might devise a procedure to obtain a lower confidence limit for the mean, but some of these are better and some are worse (in a sense that is meaningful and well-defined). Option 2 is an excellent procedure, because a person using it would need to collect less than half as much data as a person using Option 1 in order to obtain results of comparable quality. Half as much data typically means half the budget and half the time, so we're talking about a substantial and economically important difference. This supplies a concrete demonstration of the value of statistical theory.

Rather than rehash the theory, of which many excellent textbook accounts exist, let's quickly explore three lower confidence limit (LCL) procedures for $n$ independent normal variates of known standard deviation. I chose three natural and promising ones suggested by the question. Each of them is determined by a desired confidence level $1-\alpha$:

Option 1a, the "min" procedure. The lower confidence limit is set equal to $t_{\min} = \min(X_1, X_2, \ldots, X_n) - k^{\min}_{\alpha, n, \sigma} \sigma$. The value of the number $k^{\min}_{\alpha, n, \sigma}$ is determined so that the chance that $t_{\min}$ will exceed the true mean $\mu$ is just $\alpha$; that is, $\Pr(t_{\min} \gt \mu) = \alpha$.
Option 1b, the "max" procedure. The lower confidence limit is set equal to $t_{\max} = \max(X_1, X_2, \ldots, X_n) - k^{\max}_{\alpha, n, \sigma} \sigma$. The value of the number $k^{\max}_{\alpha, n, \sigma}$ is determined so that the chance that $t_{\max}$ will exceed the true mean $\mu$ is just $\alpha$; that is, $\Pr(t_{\max} \gt \mu) = \alpha$.
Option 2, the "mean" procedure. The lower confidence limit is set equal to $t_\text{mean} = \text{mean}(X_1, X_2, \ldots, X_n) - k^\text{mean}_{\alpha, n, \sigma} \sigma$. The value of the number $k^\text{mean}_{\alpha, n, \sigma}$ is determined so that the chance that $t_\text{mean}$ will exceed the true mean $\mu$ is just $\alpha$; that is, $\Pr(t_\text{mean} \gt \mu) = \alpha$.

As is well known, $k^\text{mean}_{\alpha, n, \sigma} = z_\alpha/\sqrt{n}$ where $\Phi(z_\alpha) = 1-\alpha$; $\Phi$ is the cumulative probability function of the standard Normal distribution. This is the formula cited in the question. A mathematical shorthand is

$k^\text{mean}_{\alpha, n, \sigma} = \Phi^{-1}(1-\alpha)/\sqrt{n}.$

The formulas for the min and max procedures are less well known but easy to determine:

$k^\text{min}_{\alpha,n,\sigma} = \Phi^{-1}(1-\alpha^{1/n})$.
$k^\text{max}_{\alpha, n, \sigma} = \Phi^{-1}((1-\alpha)^{1/n})$.

By means of a simulation, we can see that all three formulas work. The following R code conducts the experiment n.trials separate times and reports all three LCLs for each trial:

simulate <- function(n.trials=100, alpha=.05, n=5) {
  z.min <- qnorm(1-alpha^(1/n))
  z.mean <- qnorm(1-alpha) / sqrt(n)
  z.max <- qnorm((1-alpha)^(1/n))
  f <- function() {
    x <- rnorm(n); 
    c(max=max(x) - z.max, min=min(x) - z.min, mean=mean(x) - z.mean)
  }    
  replicate(n.trials, f())
}

(The code does not bother to work with general normal distributions: because we are free to choose the units of measurement and the zero of the measurement scale, it suffices to study the case $\mu=0$, $\sigma=1$. That is why none of the formulas for the various $k^*_{\alpha,n,\sigma}$ actually depend on $\sigma$.)

10,000 trials will provide sufficient accuracy. Let's run the simulation and calculate the frequency with which each procedure fails to produce a confidence limit less than the true mean:

set.seed(17)
sim <- simulate(10000, alpha=.05, n=5)
apply(sim > 0, 1, mean)

The output is

   max    min   mean 
0.0515 0.0527 0.0520

These frequencies are close enough to the stipulated value of $\alpha=.05$ that we can be satisfied all three procedures work as advertised: each one of them produces a 95% confidence lower confidence limit for the mean.

(If you're concerned that these frequencies differ slightly from $.05$, you can run more trials. With a million trials, they come even closer to $.05$: $(0.050547, 0.049877, 0.050274)$.)

However, one thing we would like about any LCL procedure is that not only should it be correct the intended proportion of time, but it should tend to be close to correct. For instance, imagine a (hypothetical) statistician who, by virtue of a deep religious sensibility, can consult the Delphic oracle (of Apollo) instead of collecting the data $X_1, X_2, \ldots, X_n$ and doing an LCL computation. When she asks the god for a 95% LCL, the god will just divine the true mean and tell that to her--after all, he's perfect. But, because the god does not wish to share his abilities fully with mankind (which must remain fallible), 5% of the time he will give an LCL that is $100\sigma$ too high. This Delphic procedure is also a 95% LCL--but it would be a scary one to use in practice due to the risk of it producing a truly horrible bound.

We can assess how accurate our three LCL procedures tend to be. A good way is to look at their sampling distributions: equivalently, histograms of many simulated values will do as well. Here they are. First though, the code to produce them:

dx <- -min(sim)/12
breaks <- seq(from=min(sim), to=max(sim)+dx, by=dx)
par(mfcol=c(1,3))
tmp <- sapply(c("min", "max", "mean"), function(s) {
  hist(sim[s,], breaks=breaks, col="#70C0E0", 
       main=paste("Histogram of", s, "procedure"), 
       yaxt="n", ylab="", xlab="LCL");
  hist(sim[s, sim[s,] > 0], breaks=breaks, col="Red", add=TRUE)
})

Histograms

They are shown on identical x axes (but slightly different vertical axes). What we are interested in are

The red portions to the right of $0$--whose areas represent the frequency with which the procedures fail to underestimate the mean--are all about equal to the desired amount, $\alpha=.05$. (We had already confirmed that numerically.)
The spreads of the simulation results. Evidently, the rightmost histogram is narrower than the other two: it describes a procedure that indeed underestimates the mean (equal to $0$) fully $95$% of the time, but even when it does, that underestimate is almost always within $2 \sigma$ of the true mean. The other two histograms have a propensity to underestimate the true mean by a little more, out to about $3\sigma$ too low. Also, when they overestimate the true mean, they tend to overestimate it by more than the rightmost procedure. These qualities make them inferior to the rightmost histogram.

The rightmost histogram describes Option 2, the conventional LCL procedure.

One measure of these spreads is the standard deviation of the simulation results:

> apply(sim, 1, sd)
     max      min     mean 
0.673834 0.677219 0.453829

These numbers tell us that the max and min procedures have equal spreads (of about $0.68$) and the usual, mean, procedure has only about two-thirds their spread (of about $0.45$). This confirms the evidence of our eyes.

The squares of the standard deviations are the variances, equal to $0.45$, $0.45$, and $0.20$, respectively. The variances can be related to the amount of data: if one analyst recommends the max (or min) procedure, then in order to achieve the narrow spread exhibited by the usual procedure, their client would have to obtain $0.45/0.21$ times as much data--over twice as much. In other words, by using Option 1, you would be paying more than twice as much for your information than by using Option 2.

+1 @whuber This is a nice illustration. In describing bootstrap confidence intervals Efron talks about accuracy and correctness. Accuracy being that the true confidence level of the interval is close to the advertised value. Your 3 examples are all accurate. Correctness refers to best. For a two-sided confidence interval that would mean an accurate one with the shortest width (the interval or bound based on the mean in your case). Your example is interesting because the three methods are at least somewhat competitive. — Michael R. Chernick, Sep 08 '12 at 21:22
The OPs option 1 is not close to being competitive for reasons I gave in my answer. — Michael R. Chernick, Sep 08 '12 at 21:22
@Michael I agree that your interpretation of Option 1 is not competitive. What I found interesting--and explored here--is that there are some more viable interpretations of how one might "compute the actual lower bound" from five separate ones, two of which I have examined here. I probably should have looked closely at a "median" option as well: it's not going to be terribly inferior to the usual calculation (about 40% less efficient). — whuber, Sep 09 '12 at 21:31

score 1 · Answer 2 · answered Sep 08 '12 at 18:41

The first option does not take account of the reduced variance that you get from the sample The first option gives you five lower 95% confidence bounds for the mean based on a sample of size 1 in each case. Combining them by averaging does not create a bound that you can interpret as a lower 95% bound. No one would do that. The second option is what is done. The average of the five independent observations has a variance smaller by a factor of 6 than the variance for a single sample. It therefore gives you a much better lower bound than any of the five you calculated the first way.

Also if the X$_i$ can be assumed to be iid normal then T will be normal.

Confused about confidence interval

2 Answers2