2

A simplification of my problem that I think conveys the essential parts of my question:

I am trying to calculate the most likely values for the mean and the standard deviation of a Gaussian distribution given some data that are drawn from the distribution.I don't know if it's relevant but I am using an MCMC sampler to find the "best" region of parameter space.

Initially I considered this a Bayesian method and was calling it Bayesian in my write up. But when I stopped to think about it, I don't have any priors for my parameters (which I guess is the same as having a flat prior?) - all I am doing is calculating likelihood and when the likelihood is large, the parameters (mean and standard deviation) are favored.

My question: is it wrong to call with a Bayesian technique because I am only calculating the likelihood? If so, what should I call it? I'm not using priors explicitly but I am getting a posterior distribution.

Xi'an
  • 105,342
user1551817
  • 1,203
  • 4
    How are you obtaining the distribution(s) you're sampling? [I expect you do have a prior in there even if you don't recognize that you do.] – Glen_b Oct 19 '23 at 21:38
  • 2
    Having a flat prior is not the same as having no prior. See https://stats.stackexchange.com/a/20535/7224 This is a Bayesian analysis even though you are only concerned with the posterior mode. Note that the flat prior does not remain flat when you switch to another parametrization. – Xi'an Oct 20 '23 at 08:11

1 Answers1

3

I'm not sure how you are using an MCMC sampler without a prior specified, since any implementation I've seen of an MCMC sampler requires the "joint distribution", i.e. likelihood x prior.

Anyhow, maybe I can try and clear some things up. Let's use some (somewhat informal) notation for the pieces of your problem / experiment. You have some data $x_1, ..., x_n$ that are realizations of random variables $X_1, ..., X_n \overset{iid}{\sim} N(x|\mu, \sigma^2)$. That is, the data is drawn from a normal distribution with unknown mean $\mu$ and unknown variance $\sigma^2$.

In and of itself, saying "this data I have was drawn from some distribution" is an assumption (although I assume you're synthetically creating the data so you actually know the true data generating process).

This assumption has actually determined what some call an "observation model" or what some call a likelihood function. Using this function, we can measure how likely our observed data is to have been generated by our model given particular values of the parameters?

Maximum likelihood basically looks at all the particular values (states) of the parameters ($\mu$ and $\sigma^2$ in this example) and finds the ones that best explain our data, or give us the highest value when plugged into the likelihood function with our fixed dataset.

In Bayesian methods we don't just want particular values of the parameters that best explain the data, but a distribution over the parameters, where the probability of a particular setting of the parameters is weighted by how well it explains the dataset we have.

How do we create a distribution over parameters? We use Bayes' theorem:

$$ p(\theta|x) = \frac{p(x|\theta)p(\theta)}{p(x)} $$

where $p(x|\theta)$ is the likelihood function we spoke of before and $p(\theta)$ is a prior of the parameters. Thus, to get a distribution over parameters we need to specify a prior so that we can then compute the posterior over parameters using Bayes' theorem.

See this other post for more elaborate discussion of maximum likelihood, maximum a posteriori, and Bayesian inference in the setting of conditional models like regression.

paul
  • 405
  • 2
  • 8
  • 2
    And an analyses that coincides with the frequentist in many ways is still uniquely Bayesian in the senses that it can try to uncover the data generating process that generated a unique never-repeatable dataset (doesn’t need a repeated sampling framework), and Bayes provides exact inference. Outside of Gaussian data models and very few others we don’t have exact inference in the frequentist world. Compare for example the exact inference from Bayesian random effects models with the approximations from frequentist models requiring high-dimensional numerical integration. – Frank Harrell Oct 20 '23 at 12:57
  • Yes, this is also important! @FrankHarrell – paul Oct 20 '23 at 13:30