0

Suppose that I am trying to use the jack-knife to estimate the variance of some estimator $E$. If I have $n$ data points, I begin by computing $n$ estimates (call them $B_1$, ..., $B_n$), each obtained from leaving one of the $n$ data points out of my sample. If I understand it correctly, the Wikipedia page then suggests that my estimate of the variance should be

$$\hat{Var(E)} = \frac{\sum_{i = 1}^n (B_i - \bar{B_i})^2}{n-1}$$

where $\bar{B_i}$ is the average of my $n$ estimates. The standard error is then $\sqrt(\hat{Var(E)})$.

On the other hand, the notes here seem to suggest

$$\hat{Var(E)} = \sum_{i = 1}^n (B_i - \bar{B_i})^2\left(\frac{n-1}{n}\right)$$

What am I missing here? And which formula should I use? Many thanks in advance!

Extra question: if the answer is indeed the second formula, will standard errors indeed get smaller (as one would expect) as $n$ gets large? I ask because $\sum_{i = 1}^n (B_i - \bar{B_i})^2/n \approx Var(B_i)$ for large $n$; so it appears that the second estimate is roughly $Var(B_i)(n-1)$. It is not clear to me that this is decreasing in $n$ (even if $Var(B_i)$ goes to zero as $n$ gets large).

afreelunch
  • 856
  • 3
  • 9

1 Answers1

2

I'm not a statistician... and this is the economics forum, so forgive me for the possible mistakes. I got most of this information from these slides.

Consider a statistic of interest $\theta$ with a consistent but biased estimate $\hat \theta$. The Jacknife estimates $\hat \theta_{(i)}$ with is the same statistic, based on the sample that is obtained from removing observation $i$. Doing this for all $i$ and taking the mean over the values $\hat \theta_{(i)}$ gives: $$ \hat \theta_{(.)} = \frac{1}{n} \sum_{i = 1}^n \hat \theta_{(i)} $$ As stated above, the estimator $\hat \theta$ is biased, so let: $$ \mathbb{E}(\hat \theta) = \theta + \frac{b}{n} + O\left(\frac{1}{n^2}\right) $$ Where $b$ is the first order bias of the estimator $\hat \theta$. Then for the Jacknife estimates, we have a similar expression: $$ \mathbb{E}(\hat \theta_{(i)}) = \theta + \frac{b}{n-1} + O\left(\frac{1}{n^2}\right). $$ So, ignoring the higher order terms, we get: $$ \mathbb{E}(\hat \theta_{(i)}) - \frac{b}{n-1} = \mathbb{E}(\hat \theta) - \frac{b}{n},\\ \to (n-1)\mathbb{E}(\hat \theta_{(i)} - \hat \theta) = \frac{b}{n} $$ This shows that: $(n-1) (\hat \theta_{(i)} - \hat \theta)$ is an `unbiased' estimator for $\dfrac{b}{n}$ (at least when we are ignoring all other higher order bias terms). Then using this correction, we have the following 'bias-corrected estimate': $$ pv_{(i)} = \hat \theta + (n-1)(\hat \theta - \hat \theta_{(i)}) = n \hat \theta + (n-1) \hat \theta_{(i)}. $$ This is called the pseudovalue.

Now, of course it would be stupid to only use $pv_{(i)}$ for one particular value of $i$ . So a better bias-corrected estimate is the one that averages over all these, the jacknife estimator: $$ \hat \theta_{jack} = \frac{1}{n}\sum_{i = }^n pv_i = n \hat \theta + (n-1) \hat \theta_{(.)}. $$ The pseudovalues $pv_{(i)}$ are not necessarily i.i.d. but assume they are anyway. Then, their variance is given by: $$ \begin{align*} {\rm var}(pv_i) &= \frac{1}{n-1} \sum_{i = 1}^n (pv_i - \hat \theta_{jack})^2,\\ &= \frac{1}{n-1} \sum_{i = 1}^n (n \hat \theta - (n-1)\hat \theta_{(.)})^2,\\ &= \frac{1}{n-1} \sum_{i = 1}^n (n \hat \theta + (n-1) \hat \theta_{(i)} - n \hat \theta - (n-1) \hat \theta_{(.)} )^2,\\ &= \frac{(n-1)^2}{n-1} \sum_{i = 1}^n (\hat \theta_{(i)}- \hat \theta_{(.)})^2,\\ &= (n-1) \sum_{i = 1}^n (\hat \theta_{(i)} - \hat \theta_{(.)})^2 \end{align*} $$ Assuming a central limit theorem holds for $\hat \theta_{jack}$, then: $$ \frac{\theta_{jack} - \theta}{s_n} \to^d N(0,1) $$ where $s_n$ is the variance of $pv_i$ divided by $n$. $$ s_n^2 = \frac{n-1}{n} \sum_{i = 1}^n \left(\hat \theta_{(i)} - \hat \theta_{(.)}\right)^2. $$ This 'variance', $s^2_n$, is the estimator that you give in your question.

tdm
  • 11,747
  • 9
  • 36
  • Thanks for this, though I have 2 questions. 1. When you say things like "biased estimator $\hat{\theta}$", I guess you mean "possibly biased estimator $\hat{\theta}$ "? One can use the jackknife even if one's estimator is unbiased (right?) 2. Shouldn't your pseudo-value be $pv_{(i)} = \hat \theta - (n-1)(\hat \theta - \hat \theta_{(i)})$ (minus not plus)? – afreelunch May 25 '21 at 09:09
  • @afreelunch as far as I know, the jacknife is mainly used as a bias correction (but probably you can also use it if the estimator is unbiased). I think the pseudovalues are correct. you compute $\hat \theta - b/n \approx \hat \theta - (n-1)(\hat \theta_{(i)} - \hat \theta)$ which equals $\hat \theta + (n-1)(\hat \theta - \hat \theta_{(i)})$ which in turn equals $n \hat \theta - (n-1) \hat \theta_{(i)}$ as on slide 7 here. – tdm May 25 '21 at 10:37
  • Apologies I misread your equation (didn't see you had interchanged $\hat {\theta}$ and $\hat {\theta}_{(i)}$)! – afreelunch May 25 '21 at 10:55