Some intuition behind the delta method:
The Delta method can be seen as combining two ideas:
- Continuous, differentiable functions can be approximated locally by an affine transformation.
- An affine transformation of a multivariate normal random variable is multivariate normal.
The 1st idea is from calculus, the 2nd is from probability. The loose intuition / argument goes:
The input random variable $\tilde{\boldsymbol{\theta}}_n$ is asymptotically normal (by assumption or by application of a central limit theorem in the case where $\tilde{\boldsymbol{\theta}}_n$ is a sample mean).
The smaller the neighborhood, the more $\mathbf{g}(\mathbf{x})$ looks like an affine transformation, that is, the more the function looks like a hyperplane (or a line in the 1 variable case).
Where that linear approximation applies (and some regularity conditions hold), the multivariate normality of $\tilde{\boldsymbol{\theta}}_n$ is preserved when function $\mathbf{g}$ is applied to $\tilde{\boldsymbol{\theta}}_n$.
- Note that function $\mathbf{g}$ has to satisfy certain conditions for this to be true. Normality isn't preserved in the neighborhood around $x=0$ for $g(x) = x^2$ because you'll basically get both halves of the bell curve mapped to the same side: both $x=-2$ and $x=2$ get mapped to $y=4$. You need $g$ strictly increasing or decreasing in the neighborhood so that this doesn't happen.
Idea 1: Locally, any continuous, differentiable function looks affine
An idea of calculus is if you zoom in enough on a continuous, differentiable function, it will look like a line (or hyperplane in the multivariate case). If we have some vector valued function $\mathbf{g}(\mathbf{x})$, in a small enough neighborhood around $\mathbf{c}$ you can approximate $\mathbf{g}(\mathbf{c} + \boldsymbol{\epsilon}) $ with the below affine function of $\boldsymbol{\epsilon}$:
$$ \mathbf{g}(\mathbf{c} + \boldsymbol{\epsilon}) \approx \mathbf{g}(\mathbf{c}) + \frac{\partial \mathbf{g}(\mathbf{c})}{\partial \mathbf{x}'} \;\boldsymbol{\epsilon} $$
Idea 2: An affine transformation of a multivariate normal random variable is multivariate normal
Let's say we have $\tilde{\boldsymbol{\theta}}$ distributed multivariate normal with mean $\boldsymbol{\mu}$ and variance $V$. That is:
$$\tilde{\boldsymbol{\theta}} \sim \mathcal{N}\left( \boldsymbol{\mu}, V\right)$$
Consider a linear transformation $A$ and consider the multivariate normal random variable defined by the linear transformation $A\tilde{\boldsymbol{\theta}}$. It's easy to show:
$$A\tilde{\boldsymbol{\theta}} - A\boldsymbol{\mu} \sim \mathcal{N}\left(\mathbf{0}, AVA'\right)$$
Putting it together:
If we know that $\tilde{\boldsymbol{\theta}} \sim \mathcal{N}\left( \boldsymbol{\mu}, V\right)$ and that function $\mathbf{g}(\mathbf{x})$ can be approximated around $\boldsymbol{\mu}$ by $\mathbf{g}(\boldsymbol{\mu}) + \frac{\partial \mathbf{g}(\boldsymbol{\mu})}{\partial \mathbf{x}'} \;\boldsymbol{\epsilon}$ then putting ideas (1) and (2) together:
$$ \mathbf{g}\left( \tilde{\boldsymbol{\theta}} \right) - \mathbf{g}(\boldsymbol{\mu}) \sim \mathcal{N} \left( \mathbf{0}, \frac{\partial \mathbf{g}(\boldsymbol{\mu})}{\partial \mathbf{x}'} V \frac{\partial \mathbf{g}(\boldsymbol{\mu})}{\partial \mathbf{x}'} '\right) $$
What can go wrong?
We have a problem doing this if any component of $\frac{\partial \mathbf{g}(\mathbf{c})}{\partial \mathbf{x}'}$ is zero. (eg. $g(x) = x^2$ at $x=0$.) We need $g$ strictly increasing or decreasing in the region where $\tilde{\boldsymbol{\theta}}_n$ has probability mass.
This is also going to be a bad approximation if $g$ doesn't look like an affine function in the region where $\tilde{\boldsymbol{\theta}}_n$ has probability mass.
It may also be a bad approximation if $\tilde{\boldsymbol{\theta}}_n$ isn't normal.
This problem:
$$g(x) = x^2 \quad \quad g'(x) = 2 x $$
If $\sqrt{n}\left( \tilde{\theta} - \mu \right) \xrightarrow{d} \mathcal{N}(0, 1)$
Applying the delta method you get...