3

In most books, the result that the MLE is asymptotically normal is given, and that is used as the definition of asymptotically normal, with no mention of what the actual definition is for a general estimator.

Let $\hat{\theta}_n$ be an estimator for $\theta$. Consider the following definitions.

Defn 1 (Wikipedia): $\hat{\theta}_n$ is asymptotically normal if there exists a sequence of constants $a_n$, $b_n$ such that $\frac{\hat{\theta}_n-a_n}{b_n}\stackrel{d}{\to} N(\mu,\sigma^2)$.

Defn 2: $\hat{\theta}_n$ is asymptotically normal if $\frac{\hat{\theta}_n-\theta}{{se}(\hat{\theta}_n)} \stackrel{d}{\to} N(0,1)$.

Defn 3: $\hat{\theta}_n$ is asymptotically normal if $\frac{\hat{\theta}_n-\theta}{{se}(\hat{\theta}_n)} \stackrel{d}{\to} N(\mu,\sigma^2)$.

Defn 4 (StackExchange/MLE version): $\hat{\theta}_n$ is asymptotically normal if $\sqrt{n}(\hat{\theta}_n -\theta) \stackrel{d}{\to} N(0,\sigma^2)$.

In Defn 1, we can assume without loss of generality that $\mu=0$ and $\sigma^2=1$, but we can't do that for Defn 2 and Defn 3. One of the main questions I have is can an estimator be asymptotically normal where the limiting distribution is normal with a nonzero mean. Does such an estimator exist?

Suppose $\theta$ is a mean and we estimate it using the bad estimator $\hat{\theta}_n = \bar{X}+1$, the sample mean plus 1. Then this is asymptotically normal according to Defn 1 ($a_n = \theta+1$), but not asymptotically normal by Defn 2 ($\frac{\hat{\theta}_n-\theta}{{se}(\hat{\theta}_n)} = \frac{\bar{X}-\theta}{{se}(\hat{\theta}_n)} + \frac{a}{{se}(\hat{\theta}_n)}$, where the first term converges to $N(0,1)$ and the second term diverges), and even Defn 3.

Defn 4 seems too limited and too closely related to MLE. It doesn't appear to account for cases where the rate of convergence is not $\sqrt{n}$.

It seems Defn 1 is the most expansive definition here. It raises the question that if Defn 1 holds with $\mu=0$ and $\sigma^2=1$, how is it related to Defn 2, specifically is it necessary that $\frac{b_n}{se(\hat{\theta}_n)}\stackrel{p}{\to}1$ and $E(\hat{\theta}_n)-a_n\stackrel{p}{\to}0$?

So what is the correct definition? Can the mean of the limiting distribution be nonzero. Is there a reference?

fe2084
  • 41
  • 1
    You might consider the case where $X_i \sim N(\theta,1)$ iid, with $\hat \theta =\bar X$ with probability $1-\frac1n$ and $\hat \theta$ is drawn from a standard Cauchy distribution with probability $\frac1n$, so the distribution of $\sqrt{n}(\hat \theta -\theta) \to \mathcal N(0,1)$ pointwise, but $\hat \theta$ has no standard deviation or indeed mean – Henry Sep 20 '22 at 11:35
  • @Henry I don't think that example makes sense. Firstly, an estimator must be a deterministic function of the $X_i$s (though you can probably approximate that using an indicator function). Secondly, I don't see how that approaches normality since the CLT would not apply for Cauchy. And lastly, converging pointwise to a distribution is a oxymoron. – fe2084 Sep 20 '22 at 22:00

1 Answers1

1

The overarching issue is whether we can center an estimator (by something and not necessarily a mean), and/or scale it (by another something, not necessarily its standard deviation) so that this centered and/or scaled version of it converges in distribution, to some distribution (not necessarily the Normal), preferably having some mean and some variance, which we can derive and compute/estimate

And we want to do that in order to perform inference based on this asymptotic distribution, if we do not have available finite-sample exact distributional result. Note that it may be the case that this asymptotic distribution has no moments, in which case inference could be based on other properties of the distribution, e.g quantiles.

So, we are content if we can determine sequences $\{a_n\}$, $\{b_n\}$, and a distribution $D$ such that $$\frac{\hat \theta_n - a_n}{b_n} \to_d D({\rm mean},\,{\rm variance}),$$ ...and more than content if we can also derive the mean and variance of $D$.

As for the definitions:

Defn 1 (wikipedia) is misunderstood by the OP, because if one reads the article, it does not say that such a general centering and scaling will lead to the standard Normal.

Defn 2 invokes silently a) the Central Limit Theorem, b) that the estimator is consistent, c) that its standard deviation is a decreasing function of the sample size

Defn 3 can be considered wrong, or at least confusing/misleading, if it implies a non-unitary variance, since if you scale something by its standard deviation, the resulting distribution (if all these exist), will necessarily have unitary standard deviation and variance.

Defn 4 indeed assumes that the rate of convergence is $\sqrt{n}$, which is not universally the case.

  • +1. And your post at https://stats.stackexchange.com/a/105749/919 is a good example of the value and generality of your characterization. – whuber Sep 26 '22 at 19:57