2

200 people were tested, 20 of those were infected. I want to get a posterior distribution for the uncertainty associated with the probability that a person is infected.

I do this like this:

n<-200
s<-20
p<-seq(0,0.3,0.001)

dp<-dbeta(p, s+1, n-s+1)

But then when I plot it, I don't know how to interpret the y axis and summary results:

plot(p, dp, type="l")

enter image description here

> summary(dp)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
 0.000000  0.000011  0.032438  3.322259  3.841204 18.820899 

So there is a 10% chance of.....something being 18.82? Or? What does this summary tell me?

Also, what is the difference between the first plot and the plot below?

plot(density(dp))

enter image description here

cvbzxc
  • 21
  • 1
    Your second plot is meaningless (at most it tells you most of the density is close to $0$). – Henry Nov 18 '23 at 10:10
  • See here. Very briefly, density is a relative number (usually chosen so that the full curve integrates to 1). Your second density of a density is pretty meaningless, other than that it shows most of the probability densities for $p$ are close to zero I guess (but it says nothing about for which values of $p$ this is which is the relevant part). – PBulls Nov 18 '23 at 10:10
  • 1
    In your first plot of the posterior density, the $y$ axis is telling you the density, which you need to integrate to get a probability. So for example you might estimate visually the posterior probability that $x$ is between $0.09$ and $0.11$ (where the density is around $18$) is going to be roughly about $(0.11-0.99)\times18 = 0.36$ and then check it with pbeta(0.11,21,181)-pbeta(0.09,21,181) giving 0.3628953 or with sum(dp[p<=0.11&p>0.09])*0.001 giving 0.362944 – Henry Nov 18 '23 at 10:20

1 Answers1

2

But then when I plot it, I don't know how to interpret the y axis

The first plot is a probability density plot of a beta distribution. The y-axis is the density and you would not normally try to interpret this. It is scaled so that the total area under the curve is 1.

and summary results:
summary(dp)

 Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
0.000000  0.000011  0.032438  3.322259  3.841204 18.820899 

So there is a 10% chance of.....something being 18.82? Or? What does this summary tell me?`

No, 18.820899 is the maximum value. This summary tells you

  1. The minimum and maximum values of the data
  2. The interquartile range ( 0.000011, 3.841204)
  3. The mean and median

Notice how much larger the mean is than the median, indicating that you have a right-skewed distribution.

Also, what is the difference between the first plot and the plot below?

The 2nd plot is a density of the density, which does not make a lot of sense and I would not try to interpret it.

Robert Long
  • 60,630