1

In bayesian linear regression for example, we may specify a model as: $$y_i \sim N(\beta_0 + \beta_1 x_i, \epsilon^2) \\\\ \beta_0 \sim N(0, \tau_0^2) \\\\ \beta_1 \sim N(0, \tau_1^2) \\\\ \epsilon \sim N(0, \sigma^2) $$

The posterior can be constructed as $$ P(\beta_0, \beta_1, \epsilon|y) = \frac{P(y|\beta_0, \beta_1, \epsilon)\cdot P(\beta_0)\cdot P(\beta_1)\cdot P(\epsilon)}{P(y)} $$

 
 

My question is: is it possible to partially specify prior, e.g. specify prior distribution for $\beta_1$ only, but not $\beta_0$?

$$y_i \sim N(\beta_0 + \beta_1 x_i, \epsilon^2) \\\\ \beta_1 \sim N(0, \tau_1^2) \\\\ \epsilon \sim N(0, \sigma^2) $$

I guess the posterior will be something like: $$ P(\beta_1, \epsilon|y) = \frac{P(y|\beta_1, \epsilon)\cdot P(\beta_1)\cdot P(\epsilon)}{P(y)} $$

How do I interpret this posterior? Does this setting makes $\beta_0$ behave like a frequentist term (fixed but unknown)? How does this model relates to Ridge regression, where we penalize slope but not intercept?

  • 1
    In this case, $\pi(\beta_0)\propto 1$, which is improper. Whenever an improper prior is used, it is important to ascertain its posterior distribution is proper. – Daeyoung Mar 25 '22 at 04:53
  • Yes, I think it makes sense to interpret this setting as non-informative prior. – Taotao Tan Mar 25 '22 at 05:05
  • It seems that you don't mean that the posterior is conditioned on $\beta_0$, after all. Once you change your question, the improper prior becomes the correct answer, and the posterior is joint $P(\beta_0,\beta_1|$data$).$ Until you change your question, the improper prior is not the answer. – Peter Leopold Mar 25 '22 at 13:55
  • Do you mean the posterior should be $P(\beta_1, \epsilon| \text{data})$ instead of $P(\beta_0, \beta_1| \text{data})$? – Taotao Tan Mar 25 '22 at 14:56
  • The posterior is $P(\beta_0, \beta_1, \epsilon |$ data). (Yes, $\epsilon$ should be there. My omission was an oversight.) But you have to make a choice: 1) either you fix $\beta_0$ and it is not a parameter in your model (or it is but with a $\delta$ distribution), or 2) you don't, and it is. If it is, then yes, of course, it is part of the posterior. It was in your likelihood, wasn't it? How could it not be in your posterior? And you do mean to infer $\beta_0$ from the data, don't you? – Peter Leopold Mar 25 '22 at 17:02

1 Answers1

1

Another possibility than the one suggested by Peter Leopold is to use an improper, flat prior $p(x) \propto 1$. This is what is used in Stan probabilistic programming language when no prior is specified. This is possible but not recommended and while intuitively this is an "uninformative" prior, no prior is uninformative, and it may lead to many subtle problems.

Tim
  • 138,066