Delta method and Fisher information

Question

Delta method (Casella Theorem 5.5.24) says if the distribution of $\sqrt{n}|Y_n-\theta|\to \mathrm{n}(0, \sigma^2)$ as $n\to\infty$, (where we use sequence of $Y_n$ to estimate $\theta$), then we can use $g(Y_n)$ to estimate $g(\theta)$, and $\sqrt{n}|g(Y_n)-g(\theta)|\to \mathrm{n}(0, g'(\theta)^2\sigma^2)$.

It seems to me that $\mathrm{Var}[\sqrt{n}|g(Y_n)-g(\theta)|]=n\mathrm{Var}[g(Y_n)]$, (I'm not sure if the brackets of abs value will have any effect on the variance) and so $\mathrm{Var}[g(Y_n)]=\frac{g'(\theta)^2\sigma^2}n.$

However, Casella (10.1.7) says $\mathrm{Var}(h(\hat\theta))\approx \frac{[h'(\theta)]^2}{I_n(\theta)}$, where $I_n$ is the Fisher information number $E_\theta(\frac\partial{\partial \theta} \log L(\theta|\mathbf{X}))^2$. It seems here $\hat\theta$ corresponds to $Y_n$, which makes sense; while $I_n(\theta)$ corresponds to $n/\sigma^2$, how is that possible?

It's also said (Lemma 7.3.11) $E_\theta(\frac\partial{\partial \theta} \log L(\theta|\mathbf{X}))^2=\frac{\partial^2}{\partial \theta^2} \log L(\theta|\mathbf{X})$, under certain condition.

Overall I feel quite confused by the use of a mixture of Fisher information and Delta Method, and I can't yet find a way to resolve the mess.

With this we can further explore Example 10.1.14 mentioned in Variance of $\frac{\sum{X_i}}n$, where $X_i$'s are i.i.d. Bernoulli random variables, an estimate of $\mathrm{Var} (\hat{p})$ is $\frac1{-\frac{\partial^2}{\partial \theta^2} \log L(\theta|\mathbf{X})}\approx \frac{\hat p(1-\hat p)}n,$ which happen to be the same with the estimator of variance of $\hat p$ in the post.

Then how we proceed from this to get $\sqrt n (\hat p -p) \to \mathrm{n}[0, p(1-p)]$?

We can also further explore Example 10.1.17 mentioned in the post, an estimate of $e^{-\lambda}$ is given by $\frac{-e^{\lambda}}{-{\frac{\sum X_i}{\lambda^2} }}=-\frac{e^{-\lambda}\lambda}n$. This result is different from the result $\frac{e^{-2\lambda}\lambda}n$ given in the book. Where does things go wrong?

Updated:

An answer https://stats.stackexchange.com/a/10581/301417 of a post Intuitive explanation of Fisher Information and Cramer-Rao bound (suggested by @SextusEmpiricus gives explanation of Fisher information, which is very helpful.

Question: But I don't understand why $a\approx \mathrm{Var}()$, and so a link is missing, would anyone like to further explain it?

It seems the example of The Pareto mean https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture17.pdf uses the similar method that combines delta method (for nominator) and fisher information (for denominator) to estimate variance of a function of a parameter. — Charlie Chang, Oct 31 '20 at 14:44
I guess this has something to do with Crammér-Rao Lower Bound, which according to the way it is written in Theorem 10.1.12 seems to similar to variance. — Charlie Chang, Oct 31 '20 at 17:30
I don't think your statement of the delta method is correct. The absolute value bars should be parentheses. — angryavian, Oct 31 '20 at 19:09
@angryavian that’s useful, if it’s abs val bracket then the calc of var may be trickier. I notice in the book it’s [ ], instead of ( ), the author seems to use different parentheses deliberately for different meanings (e.g. in Delta’s Method and Theorem 10.1.2 Asymptotic efficiency of MLEs he uses [ ] for normal distribution, instead of ( )), does it make any difference to use [ ] instead of ( ) here. — Charlie Chang, Nov 01 '20 at 03:01
I think some clues can be found be the study of CR bound/inequality, of which Fisher info seems to be just an alternative. But I'm not sure how this is related to Delta Method. — Charlie Chang, Nov 01 '20 at 10:39
"How is that possible?" I am not sure what the question is. Is there some contradiction that you are wondering about? — Sextus Empiricus, Nov 01 '20 at 11:08
@SextusEmpiricus I ’m just wondering if fisher information of theta equals n/(variance of theta) and if so, how to prove that — Charlie Chang, Nov 01 '20 at 11:48
@CharlieChang is that the main question? It is about whether Fisher information equals sample variance of the parameter estimate? It is not about the Delta method? — Sextus Empiricus, Nov 01 '20 at 12:15
@SextusEmpiricus that's I would say one of the main questions. If you know how to show that that would be very helpful to me. — Charlie Chang, Nov 01 '20 at 12:18
Does this answer your question? Intuitive explanation of Fisher Information and Cramer-Rao bound — Sextus Empiricus, Nov 01 '20 at 12:18
@SextusEmpiricus I will check that. Beside that I also hope some people can help me answer my question about the Delta method, and how Fisher info is related to it. — Charlie Chang, Nov 01 '20 at 12:21
@SextusEmpiricus From the top answer I seem to understand that $\hat \theta =b/a$ and $\mathrm{Var}(\hat \theta)=\frac 1 {^2}\mathrm{Var}()$. It seems the value of $L(\theta|\mathbf{x})$ against $\theta$ produces a quadratic/parabolic curve, and so its derivative (score function) $\frac \partial {\partial \theta} L(\theta|\mathbf{x})$ against $\theta$ produces a straight line w slope $\frac {\partial^2} {\partial \theta^2} L(\theta|\mathbf{x})$, since the slope is negative, its absolute value gives $-\frac {\partial^2} {\partial \theta^2} L(\theta|\mathbf{x})=I(\theta)$. — Charlie Chang, Nov 01 '20 at 13:24
Question: But I don't understand why $a\approx \mathrm{Var}()$, would anyone like to further explain it? — Charlie Chang, Nov 01 '20 at 13:24
I see from another answer that Fisher info is curvature of $L(\theta|\mathbf{x})$~$\theta$, and steepness of $L'(\theta|\mathbf{x})$~$\theta$. And with the above calculation we get $\mathrm{Var}(\hat \theta)\to\frac1{I(\theta)}$ as $n\to\infty$, which equals to limit of $\mathrm{Var}(\sqrt n(\hat \theta-\theta))$ as $n\to\infty$, which seems to be just CR Lower Bound, and also the variance given in Delta Method. But this would give fisher info $\frac 1 {n{\mathrm{Var}(\hat \theta)}^2}=\frac 1 {n\sigma^2}\neq \frac n {\sigma^2}$, so probably something goes wrong above. — Charlie Chang, Nov 01 '20 at 13:24
I see, both Delta Method and CR Lower Bound (Theorem 10.1.12) give variance of $\sqrt n (h(\hat \theta)-h(\theta))$, which is $n\mathrm{Var}(h(\hat \theta))$, and in a sense they can be interchanged, and so by Delta Method $\mathrm{Var}(h(\hat \theta))=\frac{[h'(\theta)]^2\mathrm{Var}(\hat\theta)}n$ where $\mathrm{Var}(\theta)$ approximates CR Lower Bound of $\theta$, or $\frac1{I(\theta)}$, and so the denominator is $\frac 1 {n{\mathrm{Var}(\hat \theta)}^2}$. Thoughts: accuracy in maths expressions is irreplaceably necessary (though not sufficient) for accuracy in understanding. — Charlie Chang, Nov 01 '20 at 14:10
PS: Both elta Method and CR Lower Bound in Theorem 10.1.12 are derived similarly by Taylor expansion of $h(\hat \theta)$ or $L'(\theta|\mathbf{X})$, which gives further support to the affinity between the two. — Charlie Chang, Nov 01 '20 at 14:20
@Charlie Chang, your question is loaded with equations. It is difficult to oversee it. Is your question just about why there is a difference by a factor $n$? I don't see the problem. — Sextus Empiricus, Nov 01 '20 at 14:23
@SextusEmpiricus This is one of my questions. Others are that 1. I don't know how to get$ a\approx \mathrm{Var}()$ 2. with Delta Method we get variance of $h(\hat \theta)$ is $\frac{[h'(\theta)]^2\mathrm{Var}(\hat\theta)}n$, and with CR Lower Bound we get $\mathrm{Var}(\hat\theta)=$ CRLB of $\hat \theta$, and CRLB of $\hat \theta$ equals 1 over Fisher info, are the steps correct? 3. why CR Lower Bound can approximate the variance of $\hat \theta$ — Charlie Chang, Nov 01 '20 at 14:35

Delta method and Fisher information

0 Answers0

Linked