Profile likelihood vs quadratic log-likelihood approximation

Question

I want to compare two alternative approaches for evaluating the uncertainty of the multi-dimensional MLE $\widehat \theta$ based on a log-likelihood function $l$:

Compute a Fisher-information-based quadratic confidence interval for the MLE as $$l(\theta_i) \approx l(\widehat \theta_i) -\frac12 I_n(\widehat \theta)_{ii}(\theta_i-\widehat\theta_i)^2,$$ where $I_n$ is the observed Fisher information, and where $n$ is the number of i.i.d. observations. This is based on the CLT $$\sqrt{n}(\widehat \theta - \theta)\implies \mathcal N(0,I(\theta)^{-1}),$$ where $I(\theta)$ is the Fisher information based on one observation (i.e., $I(\theta)=nI_n(\theta)$).
Compute the profile likelihood $$pl(\theta_i)=\max_{\forall j\neq i} l(\theta_j) $$ for values $\theta_i$ around the MLE (the notation may not be fully clear: I mean re-maximization blocking the $i$-th component of the parameter over all other components: see this post for a clear definition).

I have a case where the log-likelihood reads $$ l(\theta) =\sum_{i=1}^n \left( -\frac12\log(2\pi) - \log(\sigma) - \frac{(f(\theta)_i- y_i)^2}{2\sigma^2}\right), $$ for a function $f$ that maps the parameter $\theta$ into a $n$-dimensional vector, and a sequence of i.i.d. observations $y_i$. The observed Fisher information then reads $$ I(\theta)_{kl}=\frac{\partial^2 l(\theta)}{\partial\theta_k\partial\theta_l}. $$ Are the two approaches above supposed to give similar results? In which case should the function $pl$ and the quadratic approximation be approximately the same?

You are calculating two different quantities and it is not surprising you are getting different results. The first one is quite meaningless in terms of quantifying uncertainty. Think of a two dimensional picture of the contours of a quadratic function and try to understand what each calculation represents — J. Delaney, Mar 14 '23 at 17:54
@J.Delaney thanks for your comment. I modified the question so that it should make more sense. — G. Gare, Mar 17 '23 at 08:33

score 1 · Answer 1 · answered Mar 18 '23 at 10:07

The likelihood function is a function of all components of $\theta$. What you call $l(\theta_i)$ is in fact a slice of the likelihood function taken by fixing all other components to their MLE's, i.e. $l(\theta_i,\theta_0=\hat\theta_0,\theta_1=\hat\theta_1,...)$. Based on the relation to the asymptotic distribution of the MLE, this corresponds to a conditional distribution of $\theta_i$.

The profile likelihood on the other hand, takes a slice along a principal axis (in the gaussian case), which corresponds to the marginal distribution of $\theta_i$.

The figures in this post can help you visualize it and understand the difference between conditional and marginal distributions.

Note that, since you don't know the true values of $\theta$, the conditional distribution does not represent the uncertainty on $\theta_i$. Also note that the conditional distribution will always be narrower than the marginal one.

Profile likelihood vs quadratic log-likelihood approximation

1 Answers1