3

There are numerous threads on CV about the definition of and intuition about a confidence interval. However, I was surprised that none contained my intuitive reasoning about a confidence interval, so perhaps it is wrong.

Here it goes:

The confidence interval defines a range of hypothetical sample effects which — under the assumption that the observed sample effect is identical to the true effect — would not be surprising (where the criterion surprising/not surprising is a function of the confidence level and is identical to the interval bounds).

Is there anything wrong about this statement?

(NB: if it is a correct statement, I'd still understand why it is not the common explanation, because it's hard to bring the meaning of the exact confidence level (e.g. 95%) into this.)

(ps: since many answers in other threads are hard to comprehend for me, it is possible, that my reasoning was expressed elsewhere.)

monade
  • 509
  • 1
    I feel like this is exactly how people usually describe confidence interval intuition and interpretation. Do you mean to ask why this isn’t a mathematical definition of a confidence interval? (In that case, couldn’t you basically say the same about a credible interval?) – Dave Sep 13 '23 at 13:12
  • Thanks, interesting. Do you have any source in which a confidence interval is explained exactly along these lines? (As mentioned in the ps:, I simply might have missed it because of a lack of understanding of some mathematical terms) – monade Sep 13 '23 at 13:23
  • Nearly all explanations I've seen frame it as the long-run proportion of CIs that contain the true value of the parameter. – monade Sep 13 '23 at 13:25
  • 2
    A mathematical definition is not the same as an intuition about interpretation. You seem to have linked four discussions of the former and now ask about the latter. – Dave Sep 13 '23 at 13:26
  • Fair enough! Indeed I'm looking for a source (or SE post) that contains the intuition as outlined in my quote. – monade Sep 13 '23 at 13:30
  • For reference, Wikipedia provides three interpretations, but none are long the lines of mine above. This is in part why I started to wonder whether I'm mistaken. – monade Sep 13 '23 at 13:45
  • That third one on Wikipedia seems to be aligned with what you’ve written. – Dave Sep 13 '23 at 14:13
  • 4
    Many might find this characterization of CIs problematic because it relies on undefined and unusual terms like "hypothetical sample effect" as well as on vague qualitative characterizations like "surprising." It isn't clear enough even to be wrong. – whuber Sep 13 '23 at 14:33
  • "Surprising" is exactly defined via this explanation: any hypothetical sample effect outside the interval is classified as surprising. And to my mind the term "hypothetical sample effect" is also implied in the explanations based on "long-run proportion of CIs containing the parameter". Perhaps it could be expanded to "range of effects that hypothetical samples from the population could exhibit". Hence I would disagree with "isn't clear enough even to be wrong". – monade Sep 13 '23 at 14:44
  • 1
    I don't think your assumption "the assumption that the unknown true effect corresponds to our observed sample effect " is clear. Please update the question to make this assumption understandable. (What does "correspond" mean?) – Harvey Motulsky Sep 13 '23 at 18:37
  • Thanks @HarveyMotulsky, I rephrased slightly to make this part more precise. Is the assumption clear now? – monade Sep 13 '23 at 21:47
  • 1
    I suspect the phrase "hypothetical sample effect" would be understood correctly only by somebody who already knows what a CI is and is capable of making the translation. The second part of your phrase, "the assumption that the observed sample effect is identical to the true effect," is not part of any CI definition or concept. – whuber Sep 13 '23 at 21:53
  • Sander Greenland has written much about switching terminology from confidence interval (as confidence is a bullshit term) to compatibility interval, which very much goes along with the OP's thinking. – Frank Harrell Sep 14 '23 at 11:13
  • @FrankHarrell thanks for the reference! Here's one such article by Sander Greenland (& Andrew Gelman). It contains the following explanation of a CI: "given any value in the interval and the background assumptions, the data should not seem very surprising". I wonder how to parse this: does "value" refer to the true parameter value, to the parameter of my current sample, or to the parameter of hypothetical future samples? And does "data" refer to the data of my current sample or of a hypothetical future sample? – monade Sep 14 '23 at 12:04
  • Value refers to the unknown parameter value. Data refers to current data but 'surprising' imagines an endless stream of other datasets and computes the fraction of them that are more extreme than your data. Contrast with the extreme straightforwardness of Bayesian uncertainty intervals that even apply to one-time samples that no one could attempt to replicate. – Frank Harrell Sep 14 '23 at 12:15
  • So can I say that for any (true) parameter value $\theta$ in the interval, it holds that the fraction of future hypothetical datasets which have a more extreme parameter than my current sample is $\alpha$, where "more extreme" means having a parameter that is further away from the true $\theta$ than the parameter of my sample. – monade Sep 14 '23 at 12:28
  • @monade Your question includes the statement: "under the assumption that the observed sample effect is identical to the true effect". If the observed (sample) effect were identical to the true (population, or distribution) effect, there would be no need for a confidence interval! So the question doesn't make any sense to me. – Harvey Motulsky Sep 15 '23 at 02:08
  • @HarveyMotulsky Yes, I also converge towards the realization that my explanation doesn't make sense. I think one can still construct an interval based on this reasoning, but this would be an interval for future hypothetical samples and not for the unknown population parameter (hence it might be a prediction interval and not a confidence interval, as also pointed out by Sextus Empiricus below). – monade Sep 15 '23 at 06:15

2 Answers2

3

Firstly, good for you for attempting to come up with a new intuitive description of a statistical object. I'm going to give a critique of what you've come up with, but please don't let this put you off your attempt to refine this description, or to try to come up with your own descriptions of other statistical concepts. With that out of the way, here are a number of problems with your explanation:

  • Firstly, you refer to the CI as defining a range of "hypothetical sample effects", which does not generally match what you are making an inference about in a CI. In a typical CI you have some observable data and an unknown "parameter" and your CI gives a range of values of the latter. The parameter that is the object of inference could be pretty much any unobservable aspect of the process --- it does not necessarily correspond to a "sample effect". Consequently, describing the object of inference using the term "hypothetical sample effect" is vague at best, and probably just incorrect in a range of problems.

  • Secondly, your reference to the assumption that "the observed sample effect is identical to the true effect" is unclear. Again, this seems vague at best and wrong at worst. What is the "sample effect" here? What is the "true effect" here? The true effect of what? (Is this some veiled reference to an unknown "parameter"?) Does this formulation assume some causal mechanism, and if so, how is it applicable to statistical inference problems that look only at predictive inference?

  • Finally, your description of the substantive requirement for the confidence interval procedure does not really give any clear description of (or even a reasonable allusion to) the actual mathematical criterion at issue. You merely say that the derived range of values "would not be surprising" and that this notion of surprise "is a function of the confidence level and is identical to the interval bounds". (I have no idea what you mean by asserting that the concept of surprise "is identical to the interval bounds" --- that is clearly wrong, so set that part aside.) At best, this lets me know that the procedure gives a range that is not "surprising", where the latter is determined by the confidence level. Effectively, this tells me that the confidence level affects the interval somehow, but how is not specified even vaguely. That is minimal information --- you have not told me what the confidence level actually measures or anything at all about how it affects the interval.

The outcome of these problems is this: you refer to objects and effects that are vague at best, and probably inapplicable at worst, and you tell me nothing about the actual substantive requirement for forming a confidence issue. Personally, I do not find any value in this particular description (though again, I commend you on your attempt).

Ben
  • 124,856
  • Hi Ben, thanks a lot for your thoughtful, constructive and encouraging response, much appreciated. I'm not sure which format would be best to respond to this, but I'll try in the comments (please let me know if there's a more appropriate way). – monade Sep 14 '23 at 07:07
  • Re first point: this is probably sticking my neck out a bit too far, but I'd argue that your statement that CIs give a range of values for the unknown parameter is precisely why it is often mistakenly believed that the CI gives the probability for the parameter to be contained in the CI. Since the CI is based on the sampling distribution (and constructed around the mean of the sampling distribution), is it not more correct to say that the CI gives a range of values for hypothetical sample effects and not for the unknown parameter? – monade Sep 14 '23 at 07:07
  • Re second point: as a non-statistician I don't really see the issue here, sorry. By true effect I simply mean the true value of the parameter (or the "true effect") in the population. Perhaps this confusion is caused by my use of "effect", so please replace "true effect" with "true parameter" and "sample effect" with "this parameter computed for a sample" where relevant. – monade Sep 14 '23 at 07:08
  • Re third point: again, excuse my ignorance, but I don't see how the use of the word surprising seems so unclear (not only to you, but also the commentators, so this is clearly on me). On the one hand, whether a hypothetical sample effect is considered surprising in my explanation is precisely defined in terms of whether hypothetical sample effects are inside or outside the bounds. And to me it also seems intuitive to say that a hypothetical sample effect would be considered surprising if it is far away from the actual observed sample effect (so far away that it is outside the CI bounds). – monade Sep 14 '23 at 07:08
  • Now to your final point that my explanation merely tells you "that the confidence level affects the interval somehow, but how is not specified even vaguely": I entirely agree. This is what I referred to when I said "I'd still understand why it is not the common explanation, because it's hard to bring the meaning of the exact confidence level (e.g. 95%) into this.". – monade Sep 14 '23 at 07:09
  • Thanks for your further comments. I think it is indeed best to retire the "effect" language, since it alludes to causality that might not be present in a problem. Re your other proposed changes/clarifications, some of these clarify a bit, but create new problems. For example, if "true effect" means the true parameter value then the assumption that the "observed sample effect" equals this will not hold. Re "surprising", it's not that it's unclear per se, it's just that your explanation doesn't specify the meaning of the confidence level in defining this concept. – Ben Sep 14 '23 at 22:11
1

Confidence intervals can be interpreted in two ways and it depends on whether you use a double negative or not.

A confidence intervals contains those parameter values:

  • For which the observation is not unlikely
  • For which the observation is likely

In your definition you take the first approach.

The second interpretation makes it more like an interval based on a fiducial distribution, which expresses how different values are more or less plausible.


The two approaches, are not describing a different interval. They are just a different viewpoint of the same thing. They stress different aspects of the interval and relate to the gray area in defining likely and unlikely.

Inference aims to find some optimal value (e.g. most likely) along with a range that expresses the uncertainty about the inference process. That range can be expressed in terms of values that are also likely or values that are not unlikely.

How it is called/considered depends on the setting and use of language. A value that is not unlikely is not neccesarily a value that is likely. For example imagine a scientific research that uses a cautious 99.999% confidence interval thay may contain a lot of values, many of them are not neccesarily likely and instead are just not unlikely according to some very strict 0.001% level for being unlikely.


it's hard to bring the meaning of the exact confidence level (e.g. 95%) into this

A mathematical definition that is a translation of your interpretation could be

$$CI(95\%) = \{ \theta: H(\theta;100\%-95\%) = false\}$$

In words: the 95% confidence interval/region is the range of parameter values $\theta$ for which a hypothesis test at a significance level of 5% fails. (a failed test could be regarded as the observation is not unlikely for that given parameter value, the value can not be rejected)


Just to clarify, as you characterize my explanation of a CI as containing "those parameter values for which the observation is not unlikely". Is this really the same as my statement, which essentially says that CIs contain "those parameter values of hypothetical samples which are not unlikely given the observed parameter value"?

No this is not the same.

The reason that I mischaracterized your definition is because I took a liberal approach while reading your definition and that relates to the points mentioned in the other answer and comment. For example, 'parameter' would be better than 'hypothetical sample effect' (I do not know what a 'sample effect' means, does a sample have an effect?).

I see now that I have passed beyond a certain important difference. You express something as 'not unlikely parameter values given the observation' and I translated it as 'parameter values for which the observation is not unlikely'.

There is a big jump to be made between the two. It relates a bit to the term 'inverse probability' (What exactly does the term "inverse probability" mean?). When we speak about probability and likely values, then we can understand this very well in the direction when the parameters are given and we predict the outcomes. In the other direction, when we know the outcomes, but do not know the parameters, then it is more difficult to speak about 'probability'. Terms like likelihood, confidence and fiduciality are used to replace the probability.

In your definition you use 'likely' applied to the inverse probability. That makes it difficult to interpret.

(I use the term as well, but it is in the other direction, the probability of the observation given the parameters. With my post it is however still a problem what I exactly mean with 'likely' and it is a common criticism of p-values which relates to the probability of the observed event or a more extreme event, while 'extreme' is not well defined.)

If you really want to stick to

those parameter values of hypothetical samples which are not unlikely

, then it is a wrong definition of the confidence interval.

This interpretation of the confidence interval might be close to what is actually a likelihood interval, but the likelihood interval and confidence interval are different. The confidence interval does not neccesarily contain the values with the highest likelihood. See also The basic logic of constructing a confidence interval. In the figure below you can see how the confidence interval boundaries relates to likelihood values that are not the same level (see the panel on the right where the red and green dots, depicting the boundaries, are not at the same likelihood value).

example

Legend: The red line is the upper boundary for the confidence interval and the green line is the lower boundary for the confidence interval. The confidence interval is drawn for $\pm 1 \sigma$ (approximately 68.3%). The thick black lines are the pdf (2 times) and likelihood function that cross in the points $(\theta,\hat\theta)=(-3,-1)$ and $(\theta,\hat\theta)=(0,-1)$.

  • Thanks a lot! Just to clarify, as you characterize my explanation of a CI as containing "those parameter values for which the observation is not unlikely". Is this really the same as my statement, which essentially says that CIs contain "those parameter values of hypothetical samples which are not unlikely given the observed parameter value"? – monade Sep 14 '23 at 07:53
  • @monade I took a liberal approach while reading your definition and that relates to the points mentioned in the other answer and comment. For example, 'parameter' would be better than 'hypothetical sample effect' (I do not know what a 'sample effect' means, does a sample have an effect?). I see now that I have passed beyond a certain important difference. You express something as 'not unlikely parameter values given the observation' and I translated it as 'parameter values for which the observation is not unlikely'. There is a big jump to be made between the two.... – Sextus Empiricus Sep 14 '23 at 09:02
  • ... it relates a bit to the term 'inverse probability' (What exactly does the term "inverse probability" mean?). When we speak about probability and likely values, then we can understand this very well in the direction when the parameters are given and we predict the outcomes. In the other direction, when we know the outcomes, but do not know the parameters, then it is more difficult to speak about 'probability'. Terms like likelihood, confidence and fiduciality are used to replace the probability.... – Sextus Empiricus Sep 14 '23 at 09:07
  • ... in your definition you use 'likely' applied to the inverse probability. That makes it difficult to interpret. (I used the term as well, but it is in the other direction, the probability of the observation given the parameters. With my post it is however still a problem what I exactly mean with 'likely' and it is a common criticism of p-values which relates to the probability of the observed event or a more extreme event, while 'extreme' is not well defined.) – Sextus Empiricus Sep 14 '23 at 09:09
  • I see your point, thanks! In my original explanation I used the term "un/surprising" which is of course as imprecise and unclear as "un/likely". However, it is meant to be an intuitive explanation and not a precise definition, and for that "surprising" does the job in my view: you'd be surprised to find a parameter in a hypothetical future sample that is more extreme than your CI bounds. – monade Sep 14 '23 at 09:16
  • Your mathematical definition is very much in line with this third interpretation of CIs on Wikipedia, as also pointed out by @Dave. Still, personally I like about my explanation that it avoids the concept of significance testing and only assumes an understanding of the sampling distribution. The idea is the same though, so I agree. – monade Sep 14 '23 at 09:18
  • Re "sample effect": in the German literature a standard term in this context is "Stichprobeneffekt" which literally would translate to "sample effect". It just means that we compute the parameter of interest for a sample, such as a mean difference or a correlation. I was not aware that the translation "sample effect" is apparently very uncommon! – monade Sep 14 '23 at 09:20
  • The issue mentioned in the comments here might be the same as from this question https://stats.stackexchange.com/a/369909/ . Your interpretation of the confidence interval might be close to what is actually a likelihood interval. Those two are different. The confidence interval does not neccesarily contain the values with the highest likelihood. – Sextus Empiricus Sep 14 '23 at 09:22
  • I'm a bit confused and I think my confusion could be cleared if I understood what you mean by "parameter values" in your sentence "the 95% confidence interval/region is the range of parameter values θ for which a hypothesis test at a significance level of 5% fails". If "parameter values" means "parameter values of hypothetical samples", then I completely agree. And in this case I'd argue that my explanation is effectively identical to your mathematical definition, which means my explanation would have to be an explanation for a confidence interval (and not another type of interval). – monade Sep 14 '23 at 09:36
  • @monade I mean parameter values of hypothetical populations (from which a sample/observation can be taken). The estimated parameter values are estimations of the population. – Sextus Empiricus Sep 14 '23 at 09:55
  • While I still have to fully parse and understand your response, I think that it addresses exactly where my explanation might be wrong, so hence I accept it. As I understand, my basic reasoning error is that I construct a distribution around my sample parameter and use this as a reference to quantify or verbalize my surprise about hypothetical future sample parameters. However, the sample parameter is itself uncertain and so can't serve as such a reference. Instead, the reference points are different assumed values for the true parameter, ... – monade Sep 14 '23 at 12:44
  • ... which then allows making a statement about the (non-)significance of my sample parameter in reference to that. (I hope this is not wrong all again..) In any case, a big THANK YOU! – monade Sep 14 '23 at 12:46
  • 1
    @monade An interval that relates to potential values/statistics of a new sample is a prediction interval. Confidence intervals relate instead to the value of a parameter that describes the population. – Sextus Empiricus Sep 14 '23 at 12:59
  • @monade You write “around my sample parameter”, but confidence intervals do not neccesarily need to be distributed around the 'sample parameter' (which I interpret as the estimate based on the sample, or the mean or the median of the sample). See for instance the confidence interval for a uniform distribution https://math.stackexchange.com/questions/190436/ for which the estimate based in the sample (the maximum of the sample) is at the edge of the confidence interval. See also: Why are confidence intervals of hazard ratios not symmetric? – Sextus Empiricus Sep 14 '23 at 13:02