I have a trouble understanding the concept/interpretation of confidence interval. It is commonly said that if I construct a hundred of 95-percent-confidence-intervals, then it is expected that 95 of those intervals would contain the true parameter. However, I am not sure why this should be true.
Question 1:
First, define a probability space and random variable: $(\Omega, \mathcal F, \mathbb P) \underset{X}{\rightarrow} (\mathbb R, \Sigma)$, and all $X_i$'s are iid as $X$, where $X \sim N(\mu, \sigma^2)$. Fix sample size $n$ and let us create 100 samples, $\{X^1_1, \cdots, X^1_n\}, \cdots, \{X^{100}_1, \cdots, X^{100}_n\}$. For each sample, I may form sample mean and variance: $\bar X^k = \frac{1}{n}\sum X^k_i$, and $(S^k)^2 = \frac{1}{n-1}\sum (X^k_i - \bar X^k)^2$. For convenience reason, I shall say that $\mathbb P(X \in (\mu-2\sigma, \mu+2\sigma)) = 0.95$. I can construct 100 confidence intervals $I^1, \cdots, I^{100}$, where $I^k = (\bar X^k - 2S^k, \bar X^k + 2S^k)$. Clearly, taking expectation on $\bar X^k - 2S^k$ and $\bar X^k + 2S^k$ would yield: $$\mathbb E[\bar X^k - 2S^k] = \mu - 2\sigma, \quad \quad \mathbb E[\bar X^k + 2S^k] = \mu + 2\sigma.$$
However, I have a difficulty of taking this idea to make a conclusion that "it is expected that 95 of 100 confidence intervals contain $\mu$." Or this idea could be completely irrelevant! Anyway, I would appreciate a clarification why "it is expected that 95 of 100 confidence intervals contain $\mu$."
- If we drop the normality condition of $X$, then would construction of confidence interval still work? I feel that I am somehow abusing central limit theorem that the normalized variable would eventually converge (in distribution sense) to normal probability distribution.