1

Suppose we have data that come from a normal distribution with unknown $\mu$ and $\sigma$ parameters. The twist is that each value is missing with the given probability $p$, i.e. we observe a vector of pairs: (value, probability). In particular, if $p_i=0$ we should treat the $i$-th observation as missing (removing it from the model altogether) and if $p_i=1$ we should simply add $y_i ~ N(\mu, \sigma)$ to the log-likelihood for that $i$.

In other words, I wonder what to put in the model section in the stan model below:

data {
  int N;
  real y[N]; // the value
  real p[N]; // probability the value is valid. Zero means the value should completely discarded as missing.
}
parameters {
  real mu;
  real<lower = 0> sigma;
}
model {
  for (i in 1:N) {
  // y[i] ~ normal(mu, sigma) // baseline
  // target += normal_lpdf(y[i], mu, sigma) + log(p[i]) // this approach is wrong
  }
}

When I studied Bayesian statistics some 10 years ago, I learned that problems that involve probabilities of probabilities were not solvable (unless the probability could be treated as a parameter). I hope I am wrong.


One obvious walk-around of the problem is the Monte-Carlo approach: toss a (Bernoulli) coin for each parameter $p_i$; if positive - include the $i$-th data point in the sample vector. Then solve the standard normal problem using the sampled data. Repeat the process for each of the $N$ simulations from the dataset. At the end aggregate all the (Bayesian) results together.

  • 1
    In what sense do you use the term "censored"? Censoring means that we know that an observation falls in a range of values. For example if a survival time (in days) is centered at 100 we know that survival is at least 100 days; the range is $[100, \infty)$. The lower bound 100 is informative, so we won't throw away this observation. See this. Also, are the probabilities $p_i$ only 0s and 1s? – dipetkov Sep 03 '23 at 12:42
  • You are right, sorry for a misnomer. What I mean is "maybe-missing", not "maybe-censored". I have updated the question. – Adam Ryczkowski Sep 03 '23 at 13:26
  • Thanks for the clarification. Have you seen the Missing Data and Partially Known Parameters section of the Stan user guide? I'm still not clear though: If some observations are missing and you want to ignore them completely, why don't you remove them beforehand from y[N]? – dipetkov Sep 03 '23 at 13:32
  • taking inspiration from the power prior (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4626399/) folks, one option would be to exponentiate the likelihood component of a given observation by $p_i$, or, in other words, to set y[i] ~ normal(mu, sigma/p[i]). If stan can't handle infinite variances, (which, it's stan, so probably it can't), then let eps=1e-6 and the stddev param is instead sigma/(p[i]+eps). You'd want to verify that this doesn't give ludicrous estimates for $\sigma$, but it will definitely discount "bad" data. – John Madden Sep 03 '23 at 14:05
  • @dipetkov That's the main issue: they are neither 100% missing nor existing. What we have are data points that contain a validity metric, measured conveniently in probability that this specific data point is valid. – Adam Ryczkowski Sep 03 '23 at 17:59
  • @JohnMadden That would work qualitatively, but I worry why this particular form? Why not e.g. $y_i \sim N(\mu, \frac{\sigma}{p_i^2})$? – Adam Ryczkowski Sep 03 '23 at 18:03
  • @AdamRyczkowski perhaps for no greater reason than the reason we are doing $y_i \sim Normal(a,b)$ rather than $y_i \sim Laplace(a,b)$ ;). But yes, we could generally consider any function of $p_i$ that is monotically increasing and fix 0 and 1. If some of these functions lead to a very different analysis than others, that would be worth noting. – John Madden Sep 03 '23 at 18:13
  • I apologize but I still don't get what you want the data point to contribute to the likelihood when it's partially invalid. My understanding is that you'd like to set up a mixture of two components : $p_i \times \text{component}_1 + (1-p_i) \times \text{component}_2$. This is easy to do in Stan with log_mix. However, so far you've specified component 1 only: $p_i \times \text{Normal}(\mu, \sigma)$. For the mixture likelihood idea to work you'd need to come up with a meaningful component 2 as well. – dipetkov Sep 03 '23 at 18:14

0 Answers0