1

I want to compare two alternative approaches for evaluating the uncertainty of the multi-dimensional MLE $\widehat \theta$ based on a log-likelihood function $l$:

  1. Compute a Fisher-information-based quadratic confidence interval for the MLE as $$l(\theta_i) \approx l(\widehat \theta_i) -\frac12 I_n(\widehat \theta)_{ii}(\theta_i-\widehat\theta_i)^2,$$ where $I_n$ is the observed Fisher information, and where $n$ is the number of i.i.d. observations. This is based on the CLT $$\sqrt{n}(\widehat \theta - \theta)\implies \mathcal N(0,I(\theta)^{-1}),$$ where $I(\theta)$ is the Fisher information based on one observation (i.e., $I(\theta)=nI_n(\theta)$).
  2. Compute the profile likelihood $$pl(\theta_i)=\max_{\forall j\neq i} l(\theta_j) $$ for values $\theta_i$ around the MLE (the notation may not be fully clear: I mean re-maximization blocking the $i$-th component of the parameter over all other components: see this post for a clear definition).

I have a case where the log-likelihood reads $$ l(\theta) =\sum_{i=1}^n \left( -\frac12\log(2\pi) - \log(\sigma) - \frac{(f(\theta)_i- y_i)^2}{2\sigma^2}\right), $$ for a function $f$ that maps the parameter $\theta$ into a $n$-dimensional vector, and a sequence of i.i.d. observations $y_i$. The observed Fisher information then reads $$ I(\theta)_{kl}=\frac{\partial^2 l(\theta)}{\partial\theta_k\partial\theta_l}. $$ Are the two approaches above supposed to give similar results? In which case should the function $pl$ and the quadratic approximation be approximately the same?

G. Gare
  • 83
  • 2
    You are calculating two different quantities and it is not surprising you are getting different results. The first one is quite meaningless in terms of quantifying uncertainty. Think of a two dimensional picture of the contours of a quadratic function and try to understand what each calculation represents – J. Delaney Mar 14 '23 at 17:54
  • @J.Delaney thanks for your comment. I modified the question so that it should make more sense. – G. Gare Mar 17 '23 at 08:33

1 Answers1

1

The likelihood function is a function of all components of $\theta$. What you call $l(\theta_i)$ is in fact a slice of the likelihood function taken by fixing all other components to their MLE's, i.e. $l(\theta_i,\theta_0=\hat\theta_0,\theta_1=\hat\theta_1,...)$. Based on the relation to the asymptotic distribution of the MLE, this corresponds to a conditional distribution of $\theta_i$.

The profile likelihood on the other hand, takes a slice along a principal axis (in the gaussian case), which corresponds to the marginal distribution of $\theta_i$.

The figures in this post can help you visualize it and understand the difference between conditional and marginal distributions.

Note that, since you don't know the true values of $\theta$, the conditional distribution does not represent the uncertainty on $\theta_i$. Also note that the conditional distribution will always be narrower than the marginal one.

J. Delaney
  • 5,380