My outcome variable is a series of Bernoulli trials where some values are missing
y $\in$ {0, 1, NA}
How do you impute NA values for an outcome variable in rstan in the context of a GLM, assuming they are missing at random?
Reproducible example, this runs with y. How do I modify the code to impute NA values in y_miss?
library(rstan)
set.seed(123)
#generate data
y <- rbinom(20, 1, 0.5)
N <- length(y)
x <- rnorm(20)
#outcome variable with missing data
y_miss <- c(y[1:17], NA, NA, NA)
#data as a list
data <- list(y=y, x=x, N=N)
#stan code
model_code <- '
data {
int N; //number of observations (#20)
int<lower=0, upper=1> y[N]; //Bernoulli distributed outcome variable
vector[N] x; //continuous explanatory variable
}
parameters {
real a; //intercept
real beta; //slope
}
model {
vector[N] p; //vector to hold values of linear model
a ~ normal(0,5); //prior for intercept
beta ~ normal(0,5); //prior for slope
for (i in 1:N) {
p[i] = a + beta*x[i];} //linear model
y ~ bernoulli_logit(p); //likelihood with link function
}'
#run model
m <- stan(model_code=model_code, data=data, iter=4000)

beta, and the bias grows with the amount of data. If you have any ideas, I have detailed the issue here: https://stats.stackexchange.com/questions/563129/marginalizing-out-discrete-response-variables-in-stan – user_15 Feb 07 '22 at 09:12beta, as I detail here: https://stats.stackexchange.com/questions/563129/marginalizing-out-discrete-response-variables-in-stan – user_15 Feb 10 '22 at 11:28