6

Suppose we are living in a frequentist world and want to compute confidence intervals on some quantity that is a complicated function of the parameters $q_1 = f(\Theta)$ (i.e., there's no closed-form solution that would let us re-express this in terms of one of the parameters $\theta_j = g(\Theta_{-j}, q_1)$.

  • The classical way to do this, AFAIK, would be to compute likelihood profile confidence intervals by computing $\max {\cal L}(\Theta)|q_1 = \hat {q_1}$ for a series of $\hat {q_1}$ around $f(\Theta_{\text{MLE}})$, then find the critical values of $\hat q_1$ such that the difference in the log-likelihood is half of the critical value of $\chi^2_1$ (Wilks' theorem, blah blah blah). Problem: we need an efficient, robust algorithm for equality-constrained nonlinear optimization (these do exist, e.g. using Lagrange multipliers, but are much less available/tested than those for unconstrained or box-constrained optimization).
  • We could do parametric bootstrapping (sorry, no link). Problem: slow.
  • We could assume that Wald statistics are OK and use the delta method to approximate the variances of the derived quantity (assuming that we can compute the Jacobian of $f(\Theta)$ by finite differences or automatic differentiation or ...). Problem: combines the assumption of "multivariate Normal sampling distribution"/"quadratic log-likelihood surface" with "$f$ has constant curvature".

A shortcut that I've seen used, and used myself, is to assume (approximate) multivariate Normality of the sampling distribution of the parameters; draw a MVN sample (based on the observed information matrix); compute $f(\Theta)$ for each sample; and find the quantiles of the computed values.

Lande et al (2003) call these "population prediction intervals". Bolker (2008) says that a problem with this approach is that:

It blurs the line between frequentist and Bayesian approaches. Several papers (including some of mine, e.g. Vonesh and Bolker (2005)) have used this approach, but I have yet to see a solidly grounded justification for propagating the sampling distributions of the parameters in this way.

Is anyone aware of such a justification, or is this really just cheesy pseudo-Bayesianism?

(Alternatively, are there elegant ways of computing well-justified confidence intervals on complex (not-easily-invertible) functions of parameters?)


Lande, R., S. Engen, and B.-E. Sæther. 2003. Stochastic Population Dynamics in Ecology and Conservation. Oxford University Press, Oxford, UK.

Ben Bolker
  • 43,543

1 Answers1

3

In the econometrics literature this is referred to as the method of Krinsky and Robb (1986, 1990, & 1991).

I think the argument goes as follows:
Assume we have consistently estimated the expectation and the covariance matrix of an asymptotically normal estimator $\hat{\Theta}$ of $\Theta$. Then, by the law of large numbers, we can estimate the expectation, variance, and quantiles of the distribution of $f(\hat{\Theta})$ by drawing a random sample $\left\{\tilde{\Theta}^{(1)},\ldots,\tilde{\Theta}^{(M)}\right\}$ of size $M$ (where $M$ is large) from the asymptotic normal distribution of $\hat{\Theta}$ and using the sample mean/variance/quantiles of $\left\{f\left(\tilde{\Theta}^{(1)}\right),\ldots,f\left(\tilde{\Theta}^{(M)}\right)\right\}$ as consistent estimates for their population counterparts.

From the simulation studies I've read about this method I remember that it was, overall, not superior to the delta method.

statmerkur
  • 5,950