Use Bayesian hierarchical model to predict new data points

Question

I have a data set $(n_i,y_i),i=0,...,10$. I modeled it as a Bayesian hierarchical beta-binomial model.

$y_i∼Binomial(n_i,p_i)$ and $p_i∼Beta(\alpha,\beta)$.

I have used MCMC to estimate $\alpha$ and $\beta$ (use the median as the estimated value).

Given a new $(n_j, y_j)$, I want to judge if it comes from the same model as the training set $(i=0,...,10)$, namely $p_j$ is significantly greater than $\theta$ (shrinkage $p_i$ from all group).

How can I use the estimated $\alpha$ and $\beta$ to compute it?

Zen · Answer 1 · 2013-03-27T00:50:37.290

1

In your notation, if you have a sample of, say, 1000 points from the posterior distribution of $\alpha$ and $\beta$ (which you probably have, since you've MCMCed the model) stored in vectors alpha and beta, you can get a sample of 1000 points from the posterior predictive distribution $y_{11}\mid y_1,\dots,y_{10}$ doing rbinom(1000, n_11, rbeta(1000, alpha, beta)). With this sample, you can compute any desired summaries, such as the posterior predictive mean, build histograms, etc. Take a careful look at these notes (especially page 7).

edited Mar 27 '13 at 00:50

answered Mar 27 '13 at 00:02

Zen

24,121

Thanks Zen. Given a new data point (nj,yj), I want to get the posterior probability (or something else ) that can illustrate pj is greater than p or equal to p . However, i can not write the formula of this posterior predictive distribution. What all I did is a MCMC for α , β and p. Should I incorporate the posterior predictive distribution into the process of MCMC. – user22062 Mar 27 '13 at 01:50
The histogram that I've told you about gives the probability of each possible $y_{11}$, given the information in the training sample. If the probability of the observed $y_{11}$ is "high enough", then your model is predicting things well, and this gives a good "score" to your model and prior. – Zen Mar 27 '13 at 03:15
How did you compute the posterior of $\alpha,\beta$? Did you use BUGS, or did you code it yourself? Please, show us your code. – Zen Mar 27 '13 at 03:19
I used R to code this part. The key is to use the a combination of Gibbs and Metropolis-Hastings to draw α , β and p from the posterior distributions. The iteration is 1000. The demo R code is from http://www.stat.cmu.edu/~brian/724/week06/lec15-mcmc2.pdf – user22062 Mar 27 '13 at 04:43

Faheem Mitha · Answer 2 · 2013-03-27T09:52:37.867

0

In Bayesian statistics, such predictions as usually done using the posterior predictive distribution, not by estimating the parameters directly. See the Wikipedia page on the posterior predictive distribution.

If the prior distribution is beta, and the likelihood is binomial, then the posterior predictive distribution is a beta-binomial distribution. See An introduction to the Beta-Binomial model, which has details and formulas. The previous sentence beginning "If the prior distribution" is a direct quote from pg 8 of this document. Also see the Wikipedia page on the Beta-binomial distribution.

You probably want the value of the posterior predictive distribution for your new data point to start with.

edited Mar 27 '13 at 09:52

answered Mar 26 '13 at 17:14

Faheem Mitha

215

Thanks Faheem, I am more a programmer than a statistician can you tell me what is the formula of this posterior predictive distribution. – user22062 Mar 27 '13 at 01:58
The formula is in the notes. – Zen Mar 27 '13 at 03:22
@user22062: I've edited the answer with a relevant link. – Faheem Mitha Mar 27 '13 at 09:38

random_user · Answer 3 · 2013-03-27T02:46:46.403

I would start by doing a simulation to get a sample of $y$ values when $n=n_{new}$ from the training distribution. Then subtract of $y_{new}$ from the $y$ sample you got, and check if it's credibly non-zero. Here's some code to do that:

modelDescription = "
model {
    for( i in 1 : ny ) {
    y[i] ~ dbin(p[i],n[i])
    p[i] ~ dbeta(alpha,beta)
    }
    alpha ~ dunif(0,1e3)
    beta ~ dunif(0,1e3) #some priors...
}
"


sampleSize=500
y=rbinom(sampleSize,10,0.5)
n=round(y*10)-rbinom(sampleSize,2,0.5)+rbinom(sampleSize,2,0.5) #set p=1/10 with someRandomFluctuactions
n<-ifelse(n<1,1,n) #ensure all n>0

dataList = list(
    ny = length(y) ,
    y = y ,
    n = n
)

parameters = c("alpha","beta")

#this is just a wrapper I have for rjags::coda.samples and dclone::codaSamples
test1 = bjmCreate("model1",parameters,modelDescription,dataList,60000)

require(coda)
test1p <- window(test1,start(test1)+1000) #Burn-in
chain<-as.matrix(test1p)
alpha<-chain[,1]
beta<-chain[,2]
chainSize<-nrow(chain) #note that nrow(chain)==length(alpha)==length(beta)

In the code above, we got an alpha and beta sample from MCMC, now we are going to generate a sample for $y$ when $n=50$

We model $y$ as $y \sim Binomial(n,p)$. So to generate sample for $y$, we need to do y<-rbinom(chainSize,newN,p).

We know the value of $n_{new}$, but we don't have $p$, but we can get a sample of it by remebering $p \sim Beta(\alpha,\beta)$, so we get a $p$ sample from p<-rbeta(chainSize,alpha,beta).

So, by pluging one thing into the other:

newN = 50 #setting here the value of N for which we'll predict y
p<-rbeta(chainSize,alpha,beta)
y<-rbinom(chainSize,newN,p)

Now we have a sample for $y$ when $n=50$, and our work would be done if we just wanted to predict it. But instead, you want to check if it an specific value of $y$ is from that distribution, so we do

newY = 5
yDiff <- y-newY

And check if the difference between both is credibly zero.

#Analisys of HPD
print(HPDinterval(mcmc(yDiff)))
densityplot(mcmc(yDiff))

I got the following result from densityplot(mcmc(yDiff)):

     lower upper
var1    -4     4
attr(,"Probability")
[1] 0.95

That is, zero is within the 95% HDP, and thus, we can not reject the hypothesis that the proposed $(y_{new},n_{new})$ "comes from the same model as the training set".

This, in fact, makes sense since the our "true" value p=1/10 is exactly the values we used ($y_{new}=5$ and $n_{new}=50$).

Thanks. I am just a beginner of statistics. I can not understand your code. Given a new data point (nj,yj), I want to get the posterior probability (or something else ) that can illustrate pj is greater than p or equal to p . However, i can not write the formula of this posterior predictive distribution. What all I did is a MCMC for α , β and p. Should I incorporate the posterior predictive distribution into the process of MCMC. Assume y represents some kinds of error, given a new data point like (nj=30,yj=8), how can I judge the new point's error rate is greater than the training set or not. — user22062, Mar 27 '13 at 01:53
Hi, I'm sorry I'm not good in explaining stuff, I divided the code into many parts and further explained it, don't about the first block of code, it's just used to get a sample for alpha and beta, which you already have. Check if you can understand it now. — random_user, Mar 27 '13 at 03:04
Just putting the text in another way, so it might help understand: in the second block of code, I predicted the value of y when we have n=50. In third block, I created a sample of the difference from that sample and the y you want to test. This difference should be zero if y comes from that distribution. So we check if this difference is credibly non-zero (in frequentist approach we would say, "significantly greater than zero", as you said). — random_user, Mar 27 '13 at 03:05
Also, I recommend you to take a look at this book, it's easy to understand and has some pretty good explanations in Bayesian prediction and in analysis via HDI: link — random_user, Mar 27 '13 at 03:10

Use Bayesian hierarchical model to predict new data points

3 Answers3

Linked