2

How does one update a confidence interval using Bayes rule?

Say, for example, an experiment shows that the mean lies in [A, B] with 95% confidence. Later, a colleague says they ran a similar experiment and found that the mean lies in [C, D] with 95% confidence (or any other CI).

How does one "merge" the two data under Bayes?

Please help out with any misconceptions, I'm not a trained statistician.

  • 1
    Bayes naturally use confidence intervals, rather they use credibility intervals. I am guessing is what you have is some data in the form of confidence intervals and you want to update you estimate of existing knowledge? There are rules for how to combine confidence intervals, but for answerer's to make sure they address your real need perhaps it would help for you to simply explain what data you have and what question you need answered. – ReneBt Apr 24 '19 at 08:20
  • 1
    The theoretical object that Bayesians update is the posterior distribution (which can produce intervals). So the way it would work is in order to generate your first interval, you would have had a posterior. You take that posterior and update it, and then you query that updated one for an interval to get your "updated" interval. With interval information only, the Bayesian cannot update. Niels' answer has made an assumption about which posterior generated your intervals for you (which is fine but we shouldn't take it as the way to update intervals as a Bayesian, which again there is none). – John Madden Sep 21 '22 at 13:26
  • Important to note that a confidence interval does not mean that $\mu$ "lies in [A, B] with 95% confidence." That's misinterpreting a confidence interval as a Bayesian credible interval. Sometimes they are numerically identical, but generally they are not. Also, (Bayesian) updating of frequentist confidence intervals with information from a new sample is not possible. – Durden May 22 '23 at 19:00
  • @Durden Contrary to your first sentence, it is legitimate to say “confidence is 95% that the mean lies in [A, B]” or, alternatively, “we are 95% confident that the mean lies in [A, B}” Such confidence in the observed interval is derived from the 95% coverage property of the random confidence interval. We are 95% confident in this observed interval only because it is an outcome of an assumed process of interval construction that produces intervals containing the mean approximately 95% of the time in the long run. – Graham Bornholt May 23 '23 at 00:13
  • ..... Of course, such confidence claims can be undermined to some extent if the confidence interval procedure in question has poor conditional properties. Note that although confidence levels are derived from probabilities, they do not have all the properties of probabilities. – Graham Bornholt May 23 '23 at 00:14
  • @GrahamBornholt phrasing the interpretation of a confidence interval as a probabilistic statement about $\mu$ is misleading. The "95% probability" refer to the interval itself. So the correct (but hard to grasp) statement would be "in the long run, 95% of confidence intervals will contain $\mu$," but (at least from a more general understanding of probability than pure frequentism) that is a different statement than "with 95% probability $\mu$ lies within the single interval I have constructed with the one sample that I have." – Durden May 23 '23 at 00:32
  • @Durden You need to differentiate between the observed (single) interval and the random interval. Our confidence in the observed interval containing the mean comes from the 95% probability that the random interval would contain the mean. Frequentists accept that the observed interval either contains the mean or it does not, so they never wish to imply there is a 95% probability that it does contain the mean. That is why the word "confidence" is used instead of probability. Confidence has a technical meaning within classical statistics, thanks to Neyman. – Graham Bornholt May 23 '23 at 01:02
  • The 95% confidence in the observed interval is based on the coverage probability of the confidence interval procedure of which it is one outcome. – Graham Bornholt May 23 '23 at 01:04
  • @GrahamBornholt thank you for confirming my point of confidence intervals requiring ungainly mental acrobatics to be interpreted correctly. ;-) The "95% confidence" indeed refers to the random interval that is a product of resampling, not the single observed interval. All-in-all it's a subject matter best to avoid for anyone without a PhD in statistics. – Durden May 23 '23 at 18:01
  • @Durden Thanks, but the distinction between observed and random, between 95% confidence and 95% probability are both useful and straightfoward. See Ben's answer for the standard expression which you still seem to be disputing. https://stats.stackexchange.com/questions/510727/how-to-correctly-word-a-frequentist-confidence-interval – Graham Bornholt May 23 '23 at 20:07

1 Answers1

0

So first of all, this is dependent on the distribution of your data. This question can be very tricky for certain distributions. However, generally when working with confidence intervals, a normal distribution is assumed, simplifying things.

To calculate the confidence interval, you would need to know the number of observations for both experiments. The 95% confidence interval is namely defined as: mean +- 1.96 * standard_deviation/sqrt(observations).

If you would know this you can calculate the means simply by: $$mean_1 = (A+B)/2$$ $$mean_2 = (C+D)/2$$ $$std_1 = (A-B)/2/1.96*\sqrt(n_1)$$ $$std_2 = (A-B)/2/1.96*\sqrt(n_2)$$

Now the new mean needs to be normalized according to observations $$mean = (mean_1*obs_1 + mean_2*obs_2)/(obs_1 + obs_2)$$ The standard deviation can now be calculated from the number of samples, the $$std = \sqrt{\frac{n_1std_1^2 + n_2std_2^2 + n1(mean_1-mean)^2 + n_2(mean_2-mean)^2} {n_1+n_2}}$$ For quick derivation (for example): Is it possible to find the combined standard deviation? And the interval $$A = mean - 1.96 * std / \sqrt(n1+n2) $$ $$B = mean + 1.96 * std / \sqrt(n1+n2) $$

If you don't know the amount of samples you can always still assume a certain underlaying distribution. Most often experiments can be approximated by a normal distribution. Having made this assumption you can calculate the variance and mean of both from the probability density function. Add the two functions and normalize so that the integral equals one gives the new density function from which you can calculate the 95% confidence interval.

  • 1
    You are implicitly making an assumption in here that we are taking about normally distributed variables. OP didn’t state that’s the case. – Tim Aug 24 '21 at 05:10
  • Fair enough Tim. This can obviously also be calculated for different distributions. – Niels Uitterdijk Aug 24 '21 at 06:58
  • Can, but not so obviously. For example, the confidence interval can be asymmetric, in such a case you cannot easily estimate the mean from it. Also, in many cases, there won't be a straightforward closed-form solution, so you would need to fit a distribution to the intervals and treat them as data, etc. You should make it explicit that this is relevant only to normally distributed variables. – Tim Aug 24 '21 at 07:13
  • 1
    Interval doesn't have to be described with a mean. Even for a Pareto distribution you can calculate the confidence interval. Anyhow, I edited my answer with a disclaimer of my assumption. – Niels Uitterdijk Aug 24 '21 at 07:35