20

I presume this is a rather stupid question, but I hope some of you can find a bit of time to entertain it.

Looking at asymptotic behavior of estimators/test statistics etc means looking at their behavior as sample size approaches infinity. But if our sample size is infinite, doesn't that mean that we already have the whole population?

Case in point: consistency of OLS estimates. Under certain assumptions we are guaranteed convergence of estimates to true parameters in the limit, but at infinite sample size we already have the whole population, so why would we even want to estimate anything when every possible data point is already contained in the sample, all we have to do is to perform a lookup.

  • 3
    As data size grows, convergence usually means variance starts to diminish. So you can be more and more sure that the estimated parameter is close to the actual one. Without convergence, it can be anywhere. Secondly, if data is large enough (butnot complete - infinite does not necessarily means full population), asymptotics allow you to do hypothesis testing on the estimated parameter. – Dayne Jul 28 '22 at 04:44
  • 1
    Infinite does not necessarily mean full population - how so? Due to misspecification? But models rely on assumption of no misspecification anyway. – No Free Crunch Theorem Jul 28 '22 at 04:49
  • So you can be more and more sure that the estimated parameter is close to the actual one - can I? This would require a guarantee of monotonic convergence, which I don't see in most asymptotic analysis. – No Free Crunch Theorem Jul 28 '22 at 05:08
  • Indeed, asymptotes have no points. Pun intended. – Stian Jul 28 '22 at 14:29
  • 5
    @NoFreeCrunchTheorem You have to be careful of assumptions when you start talking about infinity. If your population is infinite, an infinite sample doesn't necessarily have all the members (e.g. there's an infinite number of primes, but they're an infinitesimal fraction of all integers). If your population is finite, you can't have an infinite sample unless you're sampling with replacement, and even then you aren't guaranteed to get everyone (zero probability doesn't mean impossible: see "almost surely" ). – R.M. Jul 28 '22 at 15:00
  • I understand your point about cardinality, but I don't understand why would you presume that sample and population lie in different spaces and have different cardinalities (e.g. if population is contained in reals why would an infinite sample be contained in naturals or some other space with lower cardinality) – No Free Crunch Theorem Jul 28 '22 at 18:56
  • 2
    The point of R.M.'s example is just to give an easy to see and describe instance of one infinite set being a strict subset of another infinite set (and also not-dense). Don't get hung up on details of that example that he didn't highlight. – Daniel R. Collins Jul 28 '22 at 21:59
  • 2
    Simpler examples: The natural numbers is an infinite set, but its subset the even numbers is still infinite! – kjetil b halvorsen Jul 30 '22 at 22:19
  • @StianYttervik sounds pretty pointless if you ask me ;) – Galen Jul 30 '22 at 22:40

3 Answers3

24

The first reason we look at the asymptotics of estimators is that we want to check that our estimator is sensible. One aspect of this investigation is that we expect a sensible estimator will generally get better as we get more data, and it eventually becomes "perfect" as the amount of data gets to the full population. You are correct that when $n \rightarrow \infty$ we have the whole (super)population, so presumably we should then be able to know any identifiable parameters of interest. If that is the case, it suggests that non-consistent estimators should be ruled out of consideration as failing a basic sensibleness criterion. As you say, if we have the whole population then we should be able to determine parameters of interest perfectly, so if an estimator doesn't do this, it suggests that it is fundamentally flawed. There are a number of other asymptotic properties that are similarly of interest, but less important than consistency.

Another reason we look at the asyptotics of estimators is that if we have a large sample size, we can often use the asymptotic properties as approximations to the finite-sample behaviour of the estimator. For example, if an estimator is known to be asymptotically normally distributed (which is true in a wide class of cases) then we will often perform statistical analysis that uses the normal distribution as an approximation to the true distribution of the estimator, so long as the sample size is large. Many statistical hypothesis tests (e.g., chi-squared tests) are built on this basis, and so are a lot of confidence intervals.

Ben
  • 124,856
8

It's true, in general, that consistency is not the be-all-end-all of a statistic. But neither is unbiasedness for the same reason! When you accept biased estimators, you open a whole class of Bayes estimators that beat OLS in many respects. Case-in-point: the problem of shrinkage and L2 penalization in the estimation of a mean vector for a multivariate normal sample.

For the purposes of hypothesis testing and error estimation, asymptotics go farther than just the limit of the statistic. We're also interested in whether a suitable transformation of the statistic has a known limiting distribution. In that case, we can use the limiting distribution to approximate the sampling distribution of the test statistic in finite sample sizes. The classic example is the central limit theorem where $\sqrt{n} (\bar{X} - \mu) \rightarrow_d \mathcal{N}(0, 1)$. And similar methods are used to identify critical values, p-values, power, and sample size calculations knowing full well they are only approximations. In some cases, the approximations can be improved (such as with the Student-T distribution, Agresti correction, Clopper-Pearson intervals) or exchanged entirely for exact (Fisher's Exact Test) or empirical (Bootstrap) methods. It's my opinion that these methods should be taught alongside standard testing and error estimation methods, and the latter methods used more often in practical data analysis.

The idea of an "infinite" sample size underpins all of frequentist statistics. Consider the a "frequentist" interpretation of probability: what do we mean when we say the heads probability of a coin flip is 0.5? If you actually flipped a coin an infinite number of times, it would wear down to nothing. The same holds with frequentist methods for finite sampling, that is when you sample a substantial fraction of a finite population. The sampling distributions might change somewhat, but I still conceptualize multiple (i.e. an infinite number of) scenarios when I can replicate a particular result. The "expectation" of my estimator - and it's ultimate limit as a result of the LLN - is defined by that value. Suppose for instance I sample 30% of all surviving Siberian tigers for biometrics - say length. I can produce a CI for my length estimate. That CI is based on sampling 30% of all known Siberian tigers (as of now <400) an infinite number of times.

That said, the implementation of these methods can introduce some methodological issues (bootstrap intervals are not always valid, even with bca or double bootstrap), and some problems require analytic simplicity to provide reproducible, and communicable results. For instance, when calculating the sample size of a time to event analysis, I can base my selection of N on an exponential distribution with a known rate parameter in the control and treatment arms, duration of follow-up, and a test based on the asymptotic Wald statistic. In that case, it's easy for another statistician to verify my results.

In summary, for a didactic program, I would say learn both methods and understand their limitations. For a practical application, consider your audience and what they need to understand. And when in doubt, be conservative in your approach!

AdamO
  • 62,637
  • great answer. but the "about " in your profile is possibly the best saying I've ever heard. And I've heard many. – mlofton Jul 28 '22 at 19:37
  • @mlofton I wish I could claim it for my own, but it's the lyrical styling of Sleaford Mods. I highly recommend it. Their rant rock is riddled with brilliant one-liners like the above :). – AdamO Jul 28 '22 at 22:08
  • I'll have to check that out. A "beyond insightful" statement. Thanks. – mlofton Jul 29 '22 at 16:16
  • I think this is a good answer, @AdamO. I quibble only about one small point of detail. Fisher's Exact Test is exact if and only if both margins are known a priori. That is rarely, if ever, the case. I feel FET gets an undeserved air of superiority to other tests because of the word "exact". The inexperienced should, IMHO, beware. – Limey Jul 30 '22 at 10:14
3

Most asymptotic results are closely connected to finite first-order results. We'll review this in the non-probabilistic case and then extend to the probabilistic case.

Non random case: reduced order analysis

Recall the Taylor series of the function $\sin$ around the point $x=0$: \begin{equation*} \sin x - 0 = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \dots, \end{equation*} which most learn about in introductory calculus courses. The series helps us understand the behavior of the function $\sin$ function near $x=0$. The plot below (from Wikipedia) shows that these four terms largely reproduce the behavior of the curve near $x=0$:

Sine wave

Thus, to study the behavior of $\sin(x)$ near $x=0$, very little is lost in simply studying the behavior of the polynomial $x \mapsto x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!}$. Of course, the polynomial is much more susceptible to study, so this is convenient.

Non random case: first order analysis

The close relationship between the function $\sin$ and its truncated Taylor series also holds in the first order, when we approximate \begin{equation*} \sin x - 0 = x - \dots, \end{equation*} For example, most calculus students have shown that \begin{equation*} \lim_{x \to 0} \frac{\sin x}{x} = 1. \end{equation*} This result can be interpreted as saying that the function $\sin$ and the identity function $x \mapsto x$ are indistinguishable near $x=0$.

We stress that the limit result, although asymptotic, is merely reflecting our knowledge gained from the figure above: the two functions are very close near $x=0$.

Probabilistic case

There is an extension to Taylor series which is designed for estimators rather than functions. We'll summarise the extension and illustrate how it is behind most "asymptotic" results.

Let $P_0$ denote the true (unknown) distribution, let $\theta(P_0)$ denote the parameter of interest, and let $T(X_1, \dots, X_n)$ be the estimator using i.i.d. data $X_i \sim P_0$. Then, the following expansion is a first order expansion of the estimator around the parameter: \begin{align*} T(X_1, \dots, X_n) - \theta(P) = \frac{1}{n} \sum_{i=1}^n \varphi(X_i; P_0) + \dots. \end{align*} The term on the left is the estimator error, i.e. the difference between the estimator and its target (the parameter). The term on the right is a first order expansion, analogous to $x$ in the $\sin$ case. The function $\varphi$ is called the "influence function" and determines the asymptotic behavior, along with the remainder.

(More details later..)

Ben
  • 894