Advice on sensitivity analysis for priors in Bayesian statistics

Question

I'm not clear on how to perform sensitivity analysis on the priors. Many sites have different answers. One site indicates to perform three non-informative, weakly informative and known priors. Another suggest running the model with different priors.

Here are my questions:

I want to estimate this parameter A, in which A = mean1 + mean2 . So, the priors for mean1 and mean2 are formulated (in coding) as the following :

mean1 ~ N(u1, tau1)
A ~ N(mean1, tau2)

where

tau1 ~ gamma(alpha1,beta1)
tau2 ~ gamma(alpha2,beta2)

With the above formulation, how do I perform the sensitivity analysis?

Should I define a range of u1, tau1, and tau2 ? If so, what is the systematic procedures? It seems to me there are various combinations: varying u1 and fixing tau1 and tau2; or fix u1 and tau2, varying tau1; or fix u1 and tau1, varying tau2. How do we know what are the correct priors to use?
How do I incorporate weakly informative priors to the above formulation?
What if the sensitivity test returns different estimated mean1 and A by varying the priors? How do we make conclusion?

Your advice will be appreciated and help me get started. Thank you!

EDIT: I corrected an error. You can calculate the estimated mean2 from estimated parameters A and estimated mean1.

Excellent questions, I asked myself similar questions during a long time. Finally the answer I found was the abandonment of subjective priors. These questions are endless. More seriously, perhaps there are answers more positive than mine but I think that should depend on the professional context. — Stéphane Laurent, Mar 25 '14 at 22:49

Sean Easter · Accepted Answer · 2016-03-23T14:56:40.867

Try as I might, I'm unable to find an open source that fully explains that process, so for a fuller treatment than the below, I'm left to direct your to Bayesian Data Analysis. In particular, Ch. 6 should prove very helpful.

In broadest strokes, following the analysis one should ponder whether model inferences make realistic sense. For instance, consider this paper detailing prior considerations in a pharmacological study. In brief, liver size was a variable relevant to the model, about which much is known in medical literature:

For example, parameter 8 represents the mass of the liver as a fraction of lean body mass; from previous medical studies, the liver is known to be about 3.3% of lean body mass for young adult males, with little variation.

A non-informative prior would have suffered a few pitfalls:

If noninformative prior distributions were assigned to all the individual parameters, then the model would ﬁt the data very closely but with scientifically unreasonable parameters – for example, a person with a liver weighing 10 k. This sort of difficulty is what motivates a researcher to specify a prior distribution using external information.

This is a simple test of realism: If your model suggests a human liver the size of a Thanksgiving turkey, there's a flaw of some sort.

While not well-applied to your problem, this example makes clear how much these considerations depend on context. To examine whether the posterior is overly dependent on the prior, one can consider multiple priors and see whether the posterior changes, in a practical sense, as different priors will necessarily always yield different posteriors. For example, say you're polling a constituency about support for a proposed law. In particular, no one has ever gathered data about this population's support for this law: Your data exist in a knowledge vacuum.

You construct a model in which each person you randomly poll is a Bernoulli random variable with parameter $\theta \le 1$. You select the beta distribution as a convenience prior for $\theta$, because it's conjugate to the Bernoulli. You've polled 98 randomly selected persons, 45 of which support the measure. But you wonder whether the Jeffrey's prior, $B(\frac{1}{2}, \frac{1}{2})$ or a uniform density over $[0,1]$, given by $B(1,1)$. Leaving the formal math aside, the respective posteriors are $B(45.5, 53.5)$ and $B(46, 54)$, and it's difficult to see this difference could be of practical consequence:

Two beta distributions B(45.5, 53.5) and B(46, 54) in green and blue, respectively

(Note that the blue and green lines are nearly indistinguishable.) Now, you can imagine that if one had previous, extensive polling data on this proposal, then perhaps this data could signal an change in opinion, a status quo result, or random noise. In that case, one would be wise to compare these inferences with ones derived from more informed priors.

Now, to your specific questions above:

Parameters of interest must always have a specified distribution and prior, lest one couldn't make inferences on them at all. To the question of correct priors, that again depends on the state of knowledge of the problem, and perhaps what priors your skeptical audience will find agreeable. If knowledge is scant, you may wish to consider non-informative priors. But any choice of prior, or the choice to fix a parameter, must be justified either from existing knowledge or uncertainty.
Simply derive posteriors from any priors you would like to consider, then compare them to examine whether the difference is of consequence to the problem at hand.
That's the tough one, as it depends entirely on context. If any prior produces a turkey-liver type inference, it's probably safe to dismiss. But for subtler distinctions, I'm aware of no substitute for subject matter expertise, careful analysis, and more data. Bayes factors are often used in model comparison, but typically when comparing models of distinct forms. (I've honestly never considered Bayes factors to compare priors, but my sense is that a Bayes factor analysis would favor the vaguer prior, i.e. the one that gave more weight to the data.)

Thank you for the detail explanation and the links. I'm still at the learning stage. Few months ago, I tried to understand Bayes factor but no avail in how to put the theory into pymc code(practical). So, the idea of using Bayes factor was gave up. — user3460430, Apr 15 '14 at 12:29

score 2 · Answer 2 · answered Dec 27 '23 at 15:29

A reasonable place to start in this particular case is to recognize that the model is unidentified: tau1 and tau2 cannot be estimated separately.

Notes:

I parametrize the normal distribution in terms of the standard deviation $\sigma$ instead of the precision tau to be consistent with the parameterization used by Stan as I show how to use priorsense (a package in the Stan universe) for prior diagnostics and sensitivity.
It's not clear what the observed data is as the OP refers to both mean and A as parameters. I assume that the location u1 is a known scalar and that A is a vector of N observations.

First we show that the OP's model is unidentified; we also come up with a simpler model in which the standard deviation is identified.

$$ \begin{aligned} \mu_1 &\sim \operatorname{N}(u_1,\sigma_1) \\ A &\sim \operatorname{N}(\mu_1,\sigma_2) \end{aligned} $$

implies

$$ \begin{aligned} \mu_1 &= u_1 + \sigma_1z_1 \\ A &= \mu_1 + \sigma_2z_2 \\ &= u_1 + \sigma_1z_1 + \sigma_2z_2 \\ &= u_1 + \sqrt{\sigma_1^2+\sigma_2^2}z \\ &= u_1 + \sigma z \end{aligned} $$

where $z_1, z_2$ and $z$ are vectors of independent standard normal random variables.

So we can estimate a single standard deviation sigma, not two separate standard deviations sigma1 and sigma2, by fitting the following model:

A ~ normal(u1, sigma);

The OP also asks more generally how to choose a prior for $\sigma$. As @SeanEaster points out this depends on the context (+1): ideally, we will use domain knowledge to specify an appropriate informative prior. Or we can use a weakly informative prior instead; the Stan documentation recommends to use half-Student's $t$-distribution student_t(df, loc, scale) where the parameters are degrees of freedom, location and scale, respectively.

If we use Stan for model fitting, we can also use the priorsense package to check for (one type of) prior sensitivity. I demonstrate with the following R code snippet:

library("priorsense")
library("cmdstanr")
set.seed(123)
We want to estimate sigma based on observed data A.
N <- 10
u1 <- 0
sigma <- 2
A <- rnorm(N, mean = u1, sd = sigma)

code <- "
data {
  // data
  int N;
  vector[N] A;
  real u1;
  // prior
  real df;
  real loc;
  real scale;
}
parameters {
  real<lower=0> sigma;
}
model {
  sigma ~ student_t(df, loc, scale);
  A ~ normal(u1, sigma);
}
generated quantities {
  vector[N] log_lik;
  real lprior;
  // likelihood
  for (n in 1:N) {
    log_lik[n] = normal_lpdf(A[n] | u1, sigma);
  }
  //proir
  lprior = student_t_lpdf(sigma | df, loc, scale);
}
"

model <- cmdstan_model(write_stan_file(code))

# Specify an appropriate Student's t prior on sigma
fit1 <- model$sample(
  data = list(
    N = N, A = A, u1 = u1,
    df = 3, loc = 0, scale = 5
  ),
  seed = 1234
)

# Specify an inappropriate Student's t prior on sigma:
# its location 100 is very far from the true location 2
fit2 <- model$sample(
  data = list(
    N = N, A = A, u1 = u1,
    df = 30, loc = 100, scale = 1
  ),
  seed = 1234
)

powerscale_sensitivity(fit1)
#> Loading required namespace: testthat
#> Sensitivity based on cjs_dist:
#> # A tibble: 2 × 4
#>   variable  prior likelihood diagnosis
#>   <chr>     <dbl>      <dbl> <chr>    
#> 1 sigma    0.0169      0.185 -        
#> 2 lprior   0.0221      0.246 -
powerscale_sensitivity(fit2)
#> Sensitivity based on cjs_dist:
#> # A tibble: 2 × 4
#>   variable  prior likelihood diagnosis          
#>   <chr>     <dbl>      <dbl> <chr>              
#> 1 sigma    0.0542      0.272 prior-data conflict
#> 2 lprior   0.0547      0.274 prior-data conflict

Advice on sensitivity analysis for priors in Bayesian statistics

2 Answers2

We want to estimate sigma based on observed data A.

Linked