3

In the Bayesian setting, we update prior distribution $\pi(\theta)$ to posterior distribution $\pi(\theta | x)$ given data $x$. So data $x$ provides an operator $T_x$ on the set of distributions on the parameter space.

$$T_x: Dist(\Theta) \to Dist(\Theta)$$

In many cases, people argue on which prior distribution to take. One popular option is the Jeffreys prior (essentially the Fisher information) because it is invariant under reparametrization.

As what actually matters is the posterior distribution, I wonder why don't we take the fixed points of $T_x$, i.e. $\pi_x(\theta)$ such that $$\pi_x(\theta | x) = \pi_x(\theta)$$ as the posterior once $x$ is observed? A heuristic way to construct such $\pi_x$ is to take $\lim_{n \to \infty}T^n_x(\pi)$ for any $\pi$ if exists.

Questions

  1. Do people take such fixed points as posterior distribution, and make inferences from that?
  2. Do such limit always exist? If not, is there always some $\pi$ such that the limit exists?
Student
  • 235

2 Answers2

2

The condition $\pi(\theta|x)=\pi(\theta)$ means that $$\pi(\theta)=\frac{\pi(\theta)f(x|\theta)}{\int_\Theta \pi(\theta)f(x|\theta)\,\text d\theta}$$ i.e., $$f(x|\theta) = \int_\Theta \pi(\theta)f(x|\theta)\,\text d\theta$$ which can only hold when the density of $X$ at this realisation $x$ does not depend on $\theta$.

Looking at limits of posteriors to define "objective" or "non-informative" priors is found in the theory of reference priors, see e.g. Berger, Bernardo, & Sun (2009).

While using the posterior as prior and iterating the action leads to the notion of prior feedback for deriving maximum likelihood estimators. I developed this method (later revamped as data cloning (Lele, 2007) and MCMC maximum likelihood (Jacquier et al., 2007)).

Xi'an
  • 105,342
  • What are some pros and cons of these approaches, and why is my question unusual? Aren't these common practice? – Student Oct 20 '22 at 12:42
  • It seems that you've assumed $f(x) = \int_\Theta \pi(\theta) f(x|\theta)d\theta$. Could you explain why? - I also don't see why that brings the conclusion that there's "no information" about the parameter.. – Student Oct 20 '22 at 18:42
  • The equation$$f(x) = \int_\Theta \pi(\theta) f(x|\theta),\text d\theta$$is a definition of the marginal likelihood, not an assumption. And if $f(x|\theta)$ is the same value of all $\theta$'s, observing $x$ brings no way of discriminating between values of $\theta$: $x$ induces a constant likelihood function. – Xi'an Oct 20 '22 at 18:54
  • I see. But $f(x|\theta)$ comes first, and I think you found a nice condition: If $f(x|\theta)$ is not independent to $\theta$, then you cannot find such distribution $\pi(\theta)$. – Student Oct 20 '22 at 20:51
  • Do you know why isn't prior feedback common practice? People often pick a prior $\pi$ from experts' knowledge, thinking that the prior is the best they got before the observation. Once, the observation is done, they update the prior $\pi$ to posterior $\pi'$, which they now believe is better than the prior. However, the observation itself shouldn't alter the state. That means, they should pretend that they had the posterior before the observation, and that the same observation shows up. Then they should update it to the next posterior $\pi''$ which is better than $\pi$ and so on. @Xi'an. – Student Oct 24 '22 at 12:40
  • Prior feedback converges to a Dirac mass at the MLE, which is not a feature people running a Bayesian analysis would look for. – Xi'an Oct 24 '22 at 14:56
  • Oh, so in particular this Bayesian procedure "recovers" frequentists' MLE. I wonder if all other frequentists' tools can be formally recovered. – Student Oct 25 '22 at 12:23
  • This is too general a goal as the set of frequentist tools is boundedless. – Xi'an Oct 25 '22 at 12:34
0
  1. Yes, they do (sometimes); this is what a natural conjugate prior is - one which, when combined with the likelihood function, gives a posterior with the same functional form (but different parameter values, naturally) as the prior. They aren't always appropriate, however, as perhaps your prior information isn't well-approximated by the natural conjugate prior! There are also distributions for which no natural conjugate prior exists. If the data distribution is a member of the exponential family, then an n.c. prior does exist, but often does not outside that family; see this question and answer also.

  2. In some sense the limit does exist because the posterior approaches a point mass and updating a prior that is concentrated at a single point just gives you the prior back again. I think this isn't quite what you are looking for with the question, however. Generally, if the true parameter values are not at the boundary of the parameter space of the data's distribution, you will end up with a posterior that approaches a Gaussian distribution as the sample size $\to \infty$, but that doesn't imply you should put Gaussian priors on all your parameters just because it's conjugate to the limit of the posteriors.

jbowman
  • 38,614
  • While your points are correct, I am unsure this addresses the OP's unusual question as they look at a distribution that is not impacted by updating, an almost certain impossibility imho. – Xi'an Oct 20 '22 at 04:43