Asymptotics of MLE without closed form solutions

Question

Suppose $L(\theta;X)$ denotes the likelihood of a model where $\theta$ is the parameter and $X$ is the data. $L(\theta;X)$ doesn't have a closed-form solution for MLE. I use a numerical procedure to obtain an estimate $\hat{\theta}$. Then how can I obtain the asymptotics of $\hat{\theta}$?

I am unaware of this area so any reference pointers would also be appreciated.

the standard reference for this topic is Chapter 6 of Lehmann and Casella: https://www.dcpehvpm.org/E-Content/Stat/E%20L%20Lehaman.pdf — John Madden, Nov 21 '22 at 16:25
This is very faded in my memory, but IIRC it is a result of the Mann Wold CMT, you may not have a closed form expression of $\hat{\vec{\theta}}$, but you can express $\mathcal{I}(\vec{\theta)}$ just using your estimate as a plug-in $\mathcal{I}(\hat{\vec{\theta)}}$ for an estimate of the expected information. — AdamO, Nov 21 '22 at 16:45

score 9 · Accepted Answer · edited Nov 21 '22 at 11:43

I'm not sure what you mean by "asymptotics for $\hat\theta_n$", but if you are asking about the limiting distribution of the MLE, then the short answer is that a properly standardized version of $\hat\theta_n$ converges to the standard normal distribution.

More precisely, in a multidimensional parameter case with $\theta\in \Theta\subseteq\mathbb{R}^p,$ assuming the model is regular (i.e. the support of the distribution does not depend on $\theta$ and the log-likelihood can be computed, etc.) it can be shown that

$$ \mathcal{I}_n(\theta_0)^{1/2}(\hat\theta_n - \theta_0) \overset{d}{\to} N_p(0_p,I_p),\tag{*}\label{a} $$ where $0_p$ denotes the $p\times 1$ zero vector and $I_p$ is the $p\times p$ identity matrix. Assuming independence across the $n$ samples,

$$ \mathcal{I}_n(\theta_0) = -nE_{\theta_0}\left(\frac{\partial\log L(\theta;Y_1)}{\partial\theta\partial\theta^\top}\right), $$ is the expected Fisher information matrix for all observations and $L(\theta;Y_1)$ is the likelihood function for a single observation.

In practice, $\eqref{a}$ is useless since the true parameter value $\theta_0$ is unknown. However, the MLE is consistent, i.e.

$$ \hat\theta_n\overset{P}\to \theta_0 $$

so under appropriate technical condition, also $I(\hat\theta_n)\overset{P}\to I(\theta_0)$. Thus we have that $\eqref a$ is asymptotically equivalent to

$$ \mathcal{I}_n(\hat \theta_n)^{1/2}(\hat\theta_n - \theta_0) \overset{d}{\to} N_p(0_p,I_p).\tag{**}\label b $$

$\mathcal{I}_n(\theta)$ is not always easy to compute, because the expectation involved may be intractable, but may we still be able to calculate the hessian matrix of the log-likelihood. That is, we can calculate the observed information

$$ \mathcal{J}_n(\theta) = -\frac{\partial\log L(\theta)}{\partial\theta\partial\theta^\top}, $$

where $L(\theta)$ denotes the full likelihood.

Now, we could bypass this computational problem if in $\eqref{b}$ we could replace $\mathcal{I}_n(\hat\theta_n)$ by $\mathcal{J}_n(\hat\theta_n).$

It turns out that, under appropriate conditions, we can invoke the Law of Large Numbers to have

$$ n^{-1}\mathcal{J}_n(\theta)\overset{P}\to E_{\theta_0}\left(\frac{\partial\log L(\theta;Y_1)}{\partial\theta\partial\theta^\top}\right). $$

Thus such a replacement is legitimate and it leads to

$$ \mathcal{J}_n(\hat \theta_n)^{1/2}(\hat\theta_n - \theta_0) \overset{d}{\to} N_p(0_p,I_p),\tag{***} $$

which is asymptotically equivalent to $\eqref b.$ This is typically re-written as

$$ \hat\theta_n\, \dot\sim\, N_p(\theta_0, I_n(\hat\theta_n)^{-1}), $$

where "$\dot\sim$" means "distributed, for a large sample size, as". In practice, we deal with problems of fixed sample sizes so we pretend it to be $\sim$ although this may not necessarily be the case.

If you are only interested in a single component of $\hat\theta_n = (\hat\theta_{n,1},\ldots,\hat\theta_{n,p})$, say $\hat\theta_{n,i}$, then by the properties of the multivariate normal distribution we have

$$ \hat\theta_{n,i}\,\dot\sim N(\mu_{0,i}, J_{n}(\theta)^{ii}), $$

where $J_{n}(\theta)^{ii}$ is the cell $(i,i)$ of $J_{n}(\hat\theta_n)^{-1}$.

Using this result, we can get an approximate confidence interval of level $1-\alpha$ for $\theta_{0,i}$ as

$$ \hat\theta_{n,i} \pm z_{1-\alpha/2}\hat{\text{se}}, $$

where $\hat{\text{se}} = \sqrt{J_n(\hat\theta_n)^{ii}}$ is the estimated standard error of $\hat\theta_{n,i}.$ These are known as Wald-type confidence intervals.

Doctor Milt · Answer 2 · 2022-11-21T16:15:38.247

3

You can use the fact the MLE is asymptotically unbiased, efficient (i.e. its variance converges to the inverse of the Fisher information), and Gaussian.

In summary, $\hat\theta \rightarrow \mathcal{N}(\theta,\mathcal{I}^{-1}(\theta))$ as the sample size, $n$, goes to infinity.

You can then approximate $\mathcal{I}(\theta)$ by $\mathcal{I}(\hat\theta)$ (the information evaluated at the MLE) to construct a confidence interval for $\theta$, etc.

edited Nov 21 '22 at 16:15

answered Nov 21 '22 at 08:45

Doctor Milt

3,056

2

$\mathcal{I}^{-1}(\hat\theta)$ is not the observed information. – utobi Nov 21 '22 at 10:41

Asymptotics of MLE without closed form solutions

2 Answers2

Linked