3

Suppose that I have $X_{i} \overset{i.i.d.}{\sim} P$ with $E[X_{i}]=\mu$ and $V[X_{i}^{2}] = \sigma^{2}<\infty$.

Then by the central limit theorem I know that: \begin{align} \sqrt{n} (\bar{X}_{n} - \mu) \overset{d}{\to} N(0,\sigma^{2}) \end{align} where $\bar{X}_{n}$ is the sample average. Suppose for some silly reason I know the value of $\sigma^{2}$. Then this asymptotic approximation allows me to justify confidence sets for $\mu$ of the form: \begin{align} \bar{X}_{n} \pm q_{\alpha/2} \sqrt{\frac{\sigma^{2}}{n}} \end{align} where $q_{\alpha/2}$ is the $\alpha/2^{th}$ quantile of the standard normal. In particular: \begin{align} \lim_{n \to \infty} P \left(q_{\alpha/2} \leq \sqrt{n} \frac{(\bar{X}_{n} - \mu)}{\sigma} \leq -q_{\alpha/2} \right) = 1-\alpha\\ \implies \lim_{n \to \infty} P \left(\bar{X}_{n} + q_{\alpha/2}\frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X}_{n} -q_{\alpha/2}\frac{\sigma}{\sqrt{n}} \right) = 1-\alpha\\ \end{align} For simplicity, let: $$CI_{1} = \left[\bar{X}_{n} + q_{\alpha/2}\frac{\sigma}{\sqrt{n}} , \bar{X}_{n} -q_{\alpha/2}\frac{\sigma}{\sqrt{n}} \right]$$ Now suppose that I am a strange statistician, and that rather than the confidence interval constructed above, I prefer a confidence interval (for whatever reason) of my own making: $$CI_{2} = \left[\bar{X}_{n} + q_{\alpha/2}\frac{\sigma}{\sqrt{n}}+b_{n} , \bar{X}_{n} -q_{\alpha/2}\frac{\sigma}{\sqrt{n}} -b_{n}\right]$$ where $b_{n} = o(n^{-1/2})$ is some vanishing deterministic sequence. Note that $CI_{2}$ also provides $1-\alpha$ coverage probability asymptotically.

My question: is there any reason to prefer $CI_{1}$ to $CI_{2}$? Asymptotically they are the same, so I suspect any reason would need to appeal to finite sample arguments. For example, I can always construct the sequence $b_{n}$ such that $CI_{1}$ and $CI_{2}$ are VERY different in finite sample. So what statistical justification would lead someone to use $CI_{1}$ versus $CI_{2}$? Is there a name for the desirable property $CI_{1}$ possesses that $CI_{2}$ does not?

Thanks so much!

möbius
  • 171
  • 9
  • $CI_1$ is shorter, therefore (in some sense) more informative. If you have no reason to assume $CI_1$ is significantly biased towards being too short, why use a longer interval that conveys less information about where the actual value may be? – jbowman Apr 04 '19 at 17:43
  • @jbowman why is $CI_1$ shorter? If the sequence $b_n$ is positive that will not be the case. – möbius Apr 04 '19 at 18:14
  • Well if that's the case, then $CI_2$ doesn't have the stated coverage of $\alpha$, so cannot be considered an alternative $\alpha$-level confidence interval. At that point, having abandoned the criterion that the coverage should be at least $1-\alpha$, you have no reason left to select any confidence interval over any other confidence interval, e.g., you could select one that is $\pm 1$ regardless of $\sigma$ or anything else - but you won't be able to make any statements about coverage probabilities. – jbowman Apr 04 '19 at 18:25
  • 1
    @jbowman why does it not have the stated coverage probability? The asymptotic probability that $\mu$ is in $CI_2$ by construction of the vanishing sequence $b_n$. The point of this question is that both confidence have the same coverage probability asymptotically, as was stated in the question. – möbius Apr 04 '19 at 20:33
  • "Asymptotically" is the key word there. Lots of confidence interval calculations are exact or very close to exact in finite samples; think of the sample mean with a sample size of 20, for example, from a Uniform distribution. Unless your sequence $b_n$ has gone basically to zero by $n = 12$ or so, you're either a) too wide or b) below the stated coverage probability. – jbowman Apr 04 '19 at 21:14
  • 1
    @jbowman I think you're missing the point of the question. I interpreted it as "both $CI_1$ and $CI_2$ are asymptotically level $\alpha$ confidence intervals, so why should $CI_1$ be used over $CI_2$?" I think this is a good question (+1), since the utility of $CI_1$ is usually justified by its asymptotic coverage. Meanwhile there are few coverage guarantees for $CI_1$ for finite samples. Maybe something like the Berry-Esseen theorem can be used to give bounds for the coverage probability of $CI_1$, but nevertheless $CI_1$ and $CI_2$ do not give exactly $1-\alpha$ coverage for finite samples – Artem Mavrin Apr 04 '19 at 21:30
  • If the finite-sample coverage probabilities aren't expected to be reasonably close (whatever that means) to the asymptotic ones, many statisticians will use alternative methods, such as the bootstrap, to get approximately correct CIs. If I'm claiming a 95% CI while knowing that the true coverage is likely nowhere near 95%, maybe I should have constructed my CI differently. Another way of putting it is, if an asymptotic $\alpha$ is largely meaningless in finite samples, I shouldn't be using it if I'm working with real data.... – jbowman Apr 04 '19 at 21:37
  • 1
    Still, I do see your point; but it does look to me like it comes down to "I don't care about finite sample properties of my CIs, so why shouldn't I use any CI I want as long as it has the same asymptotic properties?" I could construct an estimate of the mean of a Normal distribution as $\sum x_i/(1000000000+n)$, too, as asymptotically it's the same as $\bar{x}$, but few people care so little about the finite sample properties of their estimators as to do that. – jbowman Apr 04 '19 at 21:39
  • @jbowman I'm not trying to suggest that both are equally valid in all cases for finite samples, and I agree that a good statistician will use their discretion to pick an appropriate confidence interval when working with real data. I just wonder if there's a decision-theoretic criterion that will choose $CI_1$ over $CI_2$, at least in some interesting cases – Artem Mavrin Apr 04 '19 at 21:56
  • @ArtemMavrin I agree, I have a hunch a decision theoretic argument will be relevant. – möbius Apr 04 '19 at 23:50
  • @jbowman To put it another way, it is not obvious to me that $CI_1$ has better finite sample coverage guarantees uniformly over all $P$ (although if this were true this would certainly be a good answer to the question). Given this, and given we do not know the process generating the data, why is it that $CI_1$ should be preferred to $CI_2$? – möbius Apr 05 '19 at 00:03
  • @Mobius - Hmmm... that is probably worth a dissertation! I suspect you'd have to figure out some conditions limiting yourself to a subset of $P$ (think what happens if you construct a CI on the mean of something that, unbeknownst to you, is a Cauchy variate, for example - the optimal $b_n$ is probably infinite). – jbowman Apr 05 '19 at 00:31
  • @möbius I believe that the "nice property" you are looking for is "actually also being a credible interval". See this question: https://stats.stackexchange.com/questions/12567/examples-of-when-confidence-interval-and-credible-interval-coincide – Flounderer Apr 09 '19 at 15:26
  • Is there any reason why you write the lower confidence interval bound to the right and the higher to its left? – Alecos Papadopoulos Apr 20 '19 at 09:45
  • @AlecosPapadopoulos why do you think the lower confidence interval bound is on the right and the upper is on the left? – möbius Apr 21 '19 at 16:03

2 Answers2

1

Non sunt multiplicanda entia sine necessitate.

Entities are not to be multiplied without necessity.

Occam's razor, in other words.

...which in our situation implies that it is not the $CI_1$ that has to defend itself, rather, it is the $CI_2$ that has to convince us of the necessity of including the $b_n$ sequence.

The "for whatever reason" offered by the OP as the reason to include the $\{b_n\}$ sequence does not cut it, not by a long shot.

This may appear to be a very abstract philosophical desideratum that is lost rather than a tangible and desirable statistical property, but this is not the case, as the following example indicates:

Consider the sequence

$$\begin{cases} {b_n} = \text{Graham's number}\;\;\;\;\;\; n=1,..., 10000 \\b_n=o(1) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; n>10000\end{cases} $$

Assume that our sample size is, hmm, $n=7358$. Then we would add to our confidence interval Graham's number, rendering it completely trivial.

This is not a straw-man argument. It tells us that if there is a single example where we can with confidence reject a proposed $\{b_n\}$ sequence, then it follows that the $\{b_n\}$ sequence that we will actually propose would be faced with the eternal question "$Why\, ?$"... and in science, the answer "$Why \;not\,?$ is a great way to start but a terrible way to end research and inference.

And justifying the use of a $\{b_n\}$ sequence is bound to be, case-specific, sample-specific, reasearch-purpose-specific, etc.

0

I would like to say a "Good" statistician can prefer $CI_1$ or $CI_2$ depending on the context. For example, if the underlying distribution is symmetric (or even better it is Gaussian), symmetric confidence interval would be preferred. From the same reason, if we know the underlying distribution is skewed (e.g., Gamma distribution), we can prefer a properly skewed confidence set.

If we seriously concern about finite-sample performance, neither $CI_1$ or $CI_2$ is good. We can always use finite-sample confidence sets based on a proper tail condition on the distribution. (For example, Chernoff bounds under sub-Gaussian condition.)

As a remark, if we want to use $CI_2$, it is always important to provide reasonable justification of the choice of $b_n$. If we cannot provide a proper justification, there is no reason to use it. To choose $b_n$ properly, we may need a decent amount of information about the underlying distribution. If we have no such information, and if we do not like the $CI_1$, there are many reasonable alternatives (e.g., Bootstrap confidence interval)

  • I do not think your first point addresses the question. For example, both $CI_{1}$ and $CI_2$ are centered on the sample mean, so now what? How I can I prefer one to the other? I also think you must implicitly have a finite sample property in mind in your first point, since confidence sets are constructed with respect to the sampling distribution of the sample mean, which is always normal (thus symmetric) under weak conditions. – möbius Apr 14 '19 at 12:45
  • I am sympathetic to your second point, but again your third point does not address the question. "If we cannot provide a justification there is no reason to use it." By similar logic, I might reasonably say "if we cannot provide a justification not to use it, there is no reason not to use it." – möbius Apr 14 '19 at 12:46