Prediction intervals for a single random variable

Question

Prediction intervals seem to be talked about most in the context of regression, but I want to reduce it to one random variable to understand the reasoning. Assume you are sampling from a normal distribution $N(\mu ,\sigma ^{2})$.

Wikipedia says the prediction interval for a new observation $X_{n+1}$ will be $\overline {X}_{n}+s_{n}{\sqrt {1+1/n}}\cdot T^{{n-1}}$.

I am wondering specifically about the $s_{n}{\sqrt {1+1/n}}$ part of the equation. If you square it to get the variance, it's $s_{n}^{2}{({1+1/n})}$.

Why is the variance $s_{n}^{2}{({1+1/n})}$ instead of just $s_{n}^{2}$? Isn't $s_{n}^{2}$ supposed to be an unbiased estimator of $\sigma ^{2}$ in $N(\mu ,\sigma ^{2})$, from which all the samples (including a hypothetical $X_{n+1}$) are drawn?

So why wouldn't a new data point $X_{n+1}$ also have variance of $s_{n}^{2}$? If I had to guess, it's something to do with the uncertainty around $\overline {X}_{n}$, hence the extra $s_{n}^{2}/n$ term.

Intuitively, it doesn't make sense to me that there is more uncertainty around a new data point i.e. variance of $s_{n}^{2}{({1+1/n})}$ when you already have same sample data to go off, compared to if you just blindly drew a new data point without any prior sampling i.e. variance of $s_{n}^{2}$. Would appreciate corrections to my thinking and reasoning about this.

doubled · Accepted Answer · 2020-06-23T02:47:35.827

Let's start with the basics and consider a prediction interval for a future observation $X \sim N(\mu,\sigma^2)$ as you asked, but to start, suppose we know $\mu,\sigma$. Now given $\alpha$, we want to find $[a,b]$ such that $$1-\alpha = P(a< X< b) = P\big(\frac{a-\mu}{\sigma} < \frac{X-\mu}{\sigma} < \frac{b-\mu}{\sigma}\big) = P\big(\frac{a-\mu}{\sigma} < Z < \frac{b-\mu}{\sigma}\big)$$

where $Z\sim N(0,1)$. From here, it follows that if $z$ is the quantile such that $P(-z<Z<z) = 1-\alpha$, then $\frac{a-\mu}{\sigma} = -z$ and $\frac{b-\mu}{\sigma} = z$, and so you get that the prediction interval is $[\mu - z\sigma,\mu+z\sigma]$ and we are done.

Now if we don't know $\mu,\sigma$, we instead estimate them with our data, and so we use $\bar{X}_n$ and $s_n$ as estimators for those two parameters. Okay now let's really get to your question, but before doing so, let's simplify your question further and consider the case with known variance to really understand the $(1+1/n)$.. the rest is just accounting for unknown variance. Suppose $\sigma = 1$. Since we are dealing with normal rvs, we know that $\bar{X} \sim N(\mu,1/n)$, and we also know that $X\sim N(\mu,1)$.

It's tempting to simply use our above interval, and replace $\mu$ with $\bar{X}$ (recall that $\sigma =1$ so no worries there). So let's try that! Our interval is $[\bar{X} - z,\bar{X}+z]$. So far so good. Let's now make sure it has the $1-\alpha$ coverage property we want: $$P(\bar{X} -z \leq X \leq \bar{X} + z) = P(-z\leq X- \bar{X} \leq z) < 1-\alpha$$

Oh no! We don't have the right confidence interval. Why? The simple answer is that $X-\bar{X} \not\sim N(0,1)$ as $\bar{X}$ is estimated from our data, so we can't just replace $\mu$ with $\bar{X}$ and pretend it's a constant. So what do we do? Well let's think about $X-\bar{X}$. We know $X \sim N(\mu,1)$, and $\bar{X}\sim N(\mu,1/n)$, and recall the basic property of normal distributions that if $A\sim N(a,\sigma_a^2),B\sim N(b,\sigma_b^2)$, then $A-B\sim N(a-b,\sigma_a^2 + \sigma_b^2)$. Applying this, we thus have that $$X - \bar{X} \sim N(\mu-\mu,1 + 1/n) = N(0,1+1/n)$$

So it's not that $X-\bar{X} \sim N(0,1)$, but rather that $\frac{X-\bar{X}}{\sqrt{1 + 1/n}} \sim N(0,1)$, and so we build our interval as we did with the fixed case but with this distribution instead. Doing this exact same approach but with unknown variance will give you the prediction interval as you have... since variance is unknown, it's now a T distribution, but everything else is the same.

Best part is first line of last paragraph (+1). – BruceET Jun 22 '20 at 23:39 — BruceET, Jun 22 '20 at 23:39

BruceET · Answer 2 · 2020-06-22T23:45:59.507

3

The variance of the $(n+1)$st observation after looking at a normal sample of $n$ is $Var(X_{n+1} + \bar X) = \sigma^2 + \frac{\sigma^2}{n}.$ But the population variance $\sigma^2$ is estimated by the sample variance $S_X^2 =\frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2.$

So if I have a sample of $n = 50$ test scores with $\bar X = 102.4,\,S = 10.7,$ then a 95% prediction interval for the next randomly chosen text score is $$\bar X \pm 2.01(10.7)\sqrt{1/50+1},$$ which is $(102.4,102.6),$ according to your formula for the prediction interval. [Computations in R.]

q = qt(.975, 49); q
[1] 2.009575
pm = c(-1,1);  a = 102.4;  s = 10.7;  n = 50
a + pmqsqrt(1/n + 1)
   2.5%   97.5% 
102.400 102.602

Remember, that without the initial sample of 50 test scores you wouldn't know that the average test score is around $\bar X=102.4.$ And that's a big clue toward guessing the 51st score.

edited Jun 22 '20 at 23:45

answered Jun 22 '20 at 23:32

BruceET

56,185

Paraphrasing what you said to see if I understand correctly:
$$X_{n+1} = \overline {X}_{n} + \epsilon$$ where $\epsilon = (X - \mu)$

$$Var(X_{n+1}) = Var(\overline {X}_{n}) + Var(\epsilon) = \sigma^2/n + \sigma^2 $$

But if $\mu$ is known, we can use $\mu$ in place of $\overline {X}_{n}$ such that:

$X_{n+1} = \mu + \epsilon = \mu + (X - \mu) = X$

$$Var(X_{n+1}) = Var(\mu) + Var(X - \mu) = 0 + \sigma^2$$ or $$Var(X_{n+1}) = Var(\mu + (X - \mu)) = Var(X) = \sigma^2$$
– Guest Jun 23 '20 at 03:22
Only when $\mu$ is known does variance = $\sigma^2$ while drawing a new datapoint from $X$. If $\mu$ is unknown, you have to account for the additional variance from $\overline {X}_{n}$, but the $\sigma^2$ still shows up, this time arising from the $\epsilon$ term. Since $X$ and $\epsilon = (X - \mu)$ both have variance $\sigma^2$ I think that is why I got extremely confused. – Guest Jun 23 '20 at 03:24
1

Parts are OK; not sure about all: If $\mu$ is known then you can use $Var(X_{n+1}) = \sigma^2.$ In fact, $X_{i+1} \sim \mathsf{NORM}(\mu,\sigma).$ So if $\sigma$ is also known, then $Z = (X_{i+1}-\mu)/\sigma$ with $Z$ std normal and $P(-1.96 <Z<1.96) = 0.95$ so 95% prediction int is $\mu\pm 1.96\sigma.$ But you start knowing neither $\mu$ nor $\sigma.$ From the sample of $n$ you est $\mu$ by $\bar X$ and $\sigma$ by $S.$ To predict $X_{n+1},$ you need to take account of two components of variance: $Var(\bar X)$ and $Var(X_{n+1}).$ That leads to the pred int In your Q and in both Answ. – BruceET Jun 23 '20 at 03:48

Prediction intervals for a single random variable

2 Answers2

Linked

Related