16

This may be a weird question. My colleagues and I are working on a medical estimation problem, where relevant prior knowledge regarding plausible values of some physiological parameters exists. In addition, these parameters can be estimated using time series data, which often have some tens of thousands of samples. What often happens is that, due to model imperfections, MAP estimation converges towards implausible solutions (e.g., very small parameter values). The priors are essentially ignored because they become irrelevant given the large amount of available (and informative) measurements.

Now I am well aware that one way (probably the preferred one) to solve this problem is to improve the time series model and try to fix its imperfections. This is proving to be really hard, however, for a number of reasons.

Another (partial) remedy might be to somehow "fix" the influence of the prior such that it does not become irrelevant. This could be done, e.g., by choosing the prior's covariance as a function of the sample size or simply assigning the prior and the likelihood fixed weights when determining final parameter estimates. That got me wondering: is this "a thing"? Is there a name for doing something like this, i.e., a "fixed-influence prior"? Surely much more knowledgeable people than me have thought about this problem.

Eike P.
  • 3,048
  • 5
    I agree with Dave that it is a feature. When your MAP nevertheless produces "implausible" solutions, I would advocate considering the likelihood rather than the prior as the culprit. Think local maxima etc. If you still want to tweak the prior, the fictitious sample interpretation of conjugate priors might be a starting point, see e.g. https://stats.stackexchange.com/questions/155059/justification-for-conjugate-prior/155116#155116 – Christoph Hanck Sep 13 '22 at 12:13
  • 4
    Note also that MAP inference is only debatably Bayesian. – John Madden Sep 13 '22 at 12:30
  • 4
    But I think this is a great question; for example sklearn Lasso function automatically multiplies the provided regularization coefficient by the sample size, somewhat like what you describe. – John Madden Sep 13 '22 at 12:32
  • 5
    A prior which sets the probabilities or densities of impossible parameter values to zero will lead to a posterior distribution with those values remaining zero. If those parameter values are merely improbable and the evidence points towards them then you start to move into the realms of Cromwell or Sherlock Holmes. – Henry Sep 13 '22 at 12:38
  • 4
    Wouldn't this amount to taking the outputs of the model and basically just pushing them into the region of measurements that are considered plausible? If I know that human body temperature is always between 13 °C and 47 °C, and I have a model that says that the patient's temperature is almost certainly approximately 200 °C, then applying a really strong prior that forces the estimate down to 47 °C isn't necessarily going to give me reasonable results. – Tanner Swett Sep 13 '22 at 21:07
  • 4
    A prior that gives a non-negligible probability fo impossible values is just wrong and needs to be replaced. – Frank Harrell Sep 14 '22 at 14:15
  • 1
    A couple thoughts, since I don't think your question really has an answer because it's not defined well enough: 1) heavy tails and skews are a classic case of when more data can hurt a bad prior (e.g. don't use mean, gaussian distribution, use median, cauchy distribution) 2) you can use variational inference or MCMC to better estimate your distribution – Andrew Holmgren Sep 14 '22 at 14:21
  • Sounds like you're basically using a meta-model that combines two different models together: one based on your initial-guesses and one informed by observations. And it sounds like you don't want your meta-model to completely ignore the initial-guesses sub-model. – Nat Sep 15 '22 at 06:18
  • 1
    Thought-experiment: You're measuring a temperature that you think is probably about 50-degC. You're using a thermometer that you suspect to have some bias plus errors in its readings. How do you update your estimate of the temperature after each reading? (Observe that, even after infinitely many readings, you probably wouldn't completely trust the thermometer due to bias.) – Nat Sep 15 '22 at 06:25
  • Very interesting question and answers. – JosephDoggie Sep 15 '22 at 12:02

4 Answers4

18

I dispute your main premise.

A prior distribution is your guess (hopefully a good guess, but still a guess).

Then you observe data and see what really happens.

When you have enough observations that contradict your original guess, it is reasonable to change your mind.

What you’re observing strikes me as a feature, not a bug, of Bayesian inference.

Dave
  • 62,186
  • 1
    In principle, I agree - the problem only arises in the context of significant model mismatch, I believe? In that case, while not completely useless, an ML estimate may be significantly biased (as it is in our case), and I would like my prior knowledge to not be completely overruled by that bias. – Eike P. Sep 13 '22 at 12:10
  • 5
    I do not mean to be stubborn, but I would take the same stance in this case as in my comment above - if the model is well-specified, then bias should be smallish in large samples (which is where the prior is overruled). If bias is still large in large samples, this, to me, points to misspecification, think omitted variable bias, so that it may be more promising to reconsider the specification of the model rather than to tweak the prior. – Christoph Hanck Sep 13 '22 at 14:31
  • 1
    If you think maximum likelihood/MAP/modes of posterior distributions are risky, then apply a suitable loss function - you may end up with the mean of the posterior distribution as a point value (though that has its own issues, as it may be impossible) – Henry Sep 13 '22 at 21:13
  • 4
    (+1) I think part of the problem here is that the specified prior does not match OP's stated beliefs. If the beliefs are so strong that one is prepared to discount the posterior after an analysis involving tens of thousands of data points, then it implies that OP's actual prior beliefs are MUCH stronger than specified in the model prior. – mkt Sep 14 '22 at 10:02
  • 1
    The usual complaint about Bayesian models is that the prior may be wrong, but in this case it sounds like the problem may lie with the likelihood (and it is overstating the amount of information that are in the data)? – Dikran Marsupial Sep 14 '22 at 10:15
  • 1
    @ChristophHanck Maybe we're having a clash of terminology here? What I mean by "significant model mismatch" is precisely that the model is not well-specified. As I alluded to in my question, I am aware that the better solution would be to fix the misspecification, but that is proving really hard (we have been trying to do that for a long time already). – Eike P. Sep 14 '22 at 13:12
  • 1
    So then what’s the problem with the data overwhelming your initial assumptions? – Dave Sep 14 '22 at 13:13
  • 2
    @DikranMarsupial That is precisely correct. In particular, it is proving very challenging to find a sufficiently good noise model that captures all relevant ways in which the regression model is misspecified, leading to biased parameter estimates. – Eike P. Sep 14 '22 at 13:15
  • @Eike, yes, that is possible - while, as many have expressed by now, your idea may not be the cleanest one theoretically it may, in your practical situation given the complications, have some merit as a kind of "second best" solution. – Christoph Hanck Sep 14 '22 at 14:04
12

The answer to this question centers on its false premise. If I can sum up your question, you are saying the posterior is really far from your prior, but rather than acknowledging that either your prior is wrong or that your likelihood is misspecified, you instead want to know how you can just use a stronger prior to enforce that the posterior is not "too far" from prior... at which point why even use data? Just start with your prior, flip a coin and roll some dice, move your prior by that amount in that direction, and then call it your posterior. From your question it sounds like if you had 2x or 10x the data you would just be asking how to make your prior 2x or 10x stronger to cancel out the data and get the posterior you want. Therefore please fix your model (or acknowledge that currently it is not possible to model this data well enough), but please do not just change your prior to get a predetermined outcome.

Go Bluth
  • 121
7

I do agree with the previous answers, but if you really want to "fix" the influence of the prior, here are some ideas.

  • If your prior is based on historical data, you can use a power prior [1] to control the relative influence of your prior on the posterior obtained with new data.

  • Alternatively, you can also consider weighing the likelihood (power scaling) so that the relative influence of your prior is increased. However, if, for example, you have a Gaussian model, this would be equivalent to increasing the standard deviation of the Gaussian. So in the end, maybe you do need to change your model.

You may also be interesting in reading [2] which uses power scaling as a way to diagnose prior sensitivity.

[1] Ibrahim, J. G., Chen, M. H., Gwon, Y., & Chen, F. (2015). The power prior: Theory and applications. Statistics in Medicine, 34(28), 3724–3749. https://doi.org/10.1002/sim.6728

[2] Kallioinen, N., Paananen, T., Bürkner, P.-C., & Vehtari, A. (2021). Detecting and diagnosing prior and likelihood sensitivity with power-scaling. https://arxiv.org/abs/2107.14054v1

Guillem
  • 375
  • 1
  • 7
1

Have you considered that your expectation is simply wrong, perhaps because of publication bias?

Alternatively, if you are so confident in your beliefs that you're willing to discount the posterior after an analysis of tens of thousands of data points, it seems to me that your specified prior does not truly reflect the strength of your belief. You should probably specify a much stronger prior - if it's strong enough, the posterior wouldn't be dominated by the data.

mkt
  • 18,245
  • 11
  • 73
  • 172