It's true, in general, that consistency is not the be-all-end-all of a statistic. But neither is unbiasedness for the same reason! When you accept biased estimators, you open a whole class of Bayes estimators that beat OLS in many respects. Case-in-point: the problem of shrinkage and L2 penalization in the estimation of a mean vector for a multivariate normal sample.
For the purposes of hypothesis testing and error estimation, asymptotics go farther than just the limit of the statistic. We're also interested in whether a suitable transformation of the statistic has a known limiting distribution. In that case, we can use the limiting distribution to approximate the sampling distribution of the test statistic in finite sample sizes. The classic example is the central limit theorem where $\sqrt{n} (\bar{X} - \mu) \rightarrow_d \mathcal{N}(0, 1)$. And similar methods are used to identify critical values, p-values, power, and sample size calculations knowing full well they are only approximations. In some cases, the approximations can be improved (such as with the Student-T distribution, Agresti correction, Clopper-Pearson intervals) or exchanged entirely for exact (Fisher's Exact Test) or empirical (Bootstrap) methods. It's my opinion that these methods should be taught alongside standard testing and error estimation methods, and the latter methods used more often in practical data analysis.
The idea of an "infinite" sample size underpins all of frequentist statistics. Consider the a "frequentist" interpretation of probability: what do we mean when we say the heads probability of a coin flip is 0.5? If you actually flipped a coin an infinite number of times, it would wear down to nothing. The same holds with frequentist methods for finite sampling, that is when you sample a substantial fraction of a finite population. The sampling distributions might change somewhat, but I still conceptualize
multiple (i.e. an infinite number of) scenarios when I can replicate a particular result. The "expectation" of my estimator - and it's ultimate limit as a result of the LLN - is defined by that value. Suppose for instance I sample 30% of all surviving Siberian tigers for biometrics - say length. I can produce a CI for my length estimate. That CI is based on sampling 30% of all known Siberian tigers (as of now <400) an infinite number of times.
That said, the implementation of these methods can introduce some methodological issues (bootstrap intervals are not always valid, even with bca or double bootstrap), and some problems require analytic simplicity to provide reproducible, and communicable results. For instance, when calculating the sample size of a time to event analysis, I can base my selection of N on an exponential distribution with a known rate parameter in the control and treatment arms, duration of follow-up, and a test based on the asymptotic Wald statistic. In that case, it's easy for another statistician to verify my results.
In summary, for a didactic program, I would say learn both methods and understand their limitations. For a practical application, consider your audience and what they need to understand. And when in doubt, be conservative in your approach!