Bayesian model with maybe-missing data

Question

Suppose we have data that come from a normal distribution with unknown $\mu$ and $\sigma$ parameters. The twist is that each value is missing with the given probability $p$, i.e. we observe a vector of pairs: (value, probability). In particular, if $p_i=0$ we should treat the $i$-th observation as missing (removing it from the model altogether) and if $p_i=1$ we should simply add $y_i ~ N(\mu, \sigma)$ to the log-likelihood for that $i$.

In other words, I wonder what to put in the model section in the stan model below:

data {
  int N;
  real y[N]; // the value
  real p[N]; // probability the value is valid. Zero means the value should completely discarded as missing.
}
parameters {
  real mu;
  real<lower = 0> sigma;
}
model {
  for (i in 1:N) {
  // y[i] ~ normal(mu, sigma) // baseline
  // target += normal_lpdf(y[i], mu, sigma) + log(p[i]) // this approach is wrong
  }
}

When I studied Bayesian statistics some 10 years ago, I learned that problems that involve probabilities of probabilities were not solvable (unless the probability could be treated as a parameter). I hope I am wrong.

One obvious walk-around of the problem is the Monte-Carlo approach: toss a (Bernoulli) coin for each parameter $p_i$; if positive - include the $i$-th data point in the sample vector. Then solve the standard normal problem using the sampled data. Repeat the process for each of the $N$ simulations from the dataset. At the end aggregate all the (Bayesian) results together.

In what sense do you use the term "censored"? Censoring means that we know that an observation falls in a range of values. For example if a survival time (in days) is centered at 100 we know that survival is at least 100 days; the range is $[100, \infty)$. The lower bound 100 is informative, so we won't throw away this observation. See this. Also, are the probabilities $p_i$ only 0s and 1s? — dipetkov, Sep 03 '23 at 12:42
You are right, sorry for a misnomer. What I mean is "maybe-missing", not "maybe-censored". I have updated the question. — Adam Ryczkowski, Sep 03 '23 at 13:26
Thanks for the clarification. Have you seen the Missing Data and Partially Known Parameters section of the Stan user guide? I'm still not clear though: If some observations are missing and you want to ignore them completely, why don't you remove them beforehand from y[N]? — dipetkov, Sep 03 '23 at 13:32
taking inspiration from the power prior (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4626399/) folks, one option would be to exponentiate the likelihood component of a given observation by $p_i$, or, in other words, to set y[i] ~ normal(mu, sigma/p[i]). If stan can't handle infinite variances, (which, it's stan, so probably it can't), then let eps=1e-6 and the stddev param is instead sigma/(p[i]+eps). You'd want to verify that this doesn't give ludicrous estimates for $\sigma$, but it will definitely discount "bad" data. — John Madden, Sep 03 '23 at 14:05
@dipetkov That's the main issue: they are neither 100% missing nor existing. What we have are data points that contain a validity metric, measured conveniently in probability that this specific data point is valid. — Adam Ryczkowski, Sep 03 '23 at 17:59
@JohnMadden That would work qualitatively, but I worry why this particular form? Why not e.g. $y_i \sim N(\mu, \frac{\sigma}{p_i^2})$? — Adam Ryczkowski, Sep 03 '23 at 18:03
@AdamRyczkowski perhaps for no greater reason than the reason we are doing $y_i \sim Normal(a,b)$ rather than $y_i \sim Laplace(a,b)$ ;). But yes, we could generally consider any function of $p_i$ that is monotically increasing and fix 0 and 1. If some of these functions lead to a very different analysis than others, that would be worth noting. — John Madden, Sep 03 '23 at 18:13
I apologize but I still don't get what you want the data point to contribute to the likelihood when it's partially invalid. My understanding is that you'd like to set up a mixture of two components : $p_i \times \text{component}_1 + (1-p_i) \times \text{component}_2$. This is easy to do in Stan with log_mix. However, so far you've specified component 1 only: $p_i \times \text{Normal}(\mu, \sigma)$. For the mixture likelihood idea to work you'd need to come up with a meaningful component 2 as well. — dipetkov, Sep 03 '23 at 18:14

Bayesian model with maybe-missing data

0 Answers0